ISO / TC 37 / SC 2 / WG 1 N 83

ISO/IEC JTC 1/SC 32                                                  

Data Management and Interchange                                      

 

 

ISO/IEC JTC 1/SC32 N 147               

 

DATE:  1998-08-05    

 

REPLACES                                     

 

DOC TYPE:

National Body Contribution                                           

 

TITLE:

Horizontal Issues and Encodable Value Domains in Electronic Commerce:

Non-technical Summary and Real World Examples to supplement BT-EC    

Report                                                               

 

SOURCE:

Canadian National Body                                               

 

PROJECT:                  

 

STATUS:

This document was reviewed at the SC 32 Plenary meetings, July 1998, 

Brisbane, Australia.                                                 

 

ACTION ID:  FYI

 

DUE DATE:           

 

DISTRIBUTION:  P & L Members                                              

               SC Chair                                                    

               WG Conveners and Secretaries                               

 

 

MEDIUM:  

 

DISKETTE NO.:           

 

NO. OF PAGES:  20       

 

 

Secretariat, ISO/IEC JTC 1/SC 32,                                    

American National Standards Institute, 11 West 42nd Street, New York,

NY 10036;  Telephone:  212-642-4976;  Fax: 212-840-2298;             

E-mail: mtopping@ansi.org                                            


Title:                     Horizontal Issues and Encodable Value Domains in Electronic Commerce: Non-technical Summary and Real World Examples to supplement BT-EC Report

Source:               CAC/JTC1/SC32, Canada

Status:                National Body Contribution

Action:                FYI and Discussion at the SC32 HOD/C and Plenary in Brisbane, July 98 

Purpose:            This document:

1.      is circulated to JTC1/SC32 as a reference document prepared to facilitate follow-up to the BT-EC Report (JTC1 N5296) by SC32/WG2 (and SC30/WG1) as stated in Resolution 8 of the 12th Plenary of JTC1 (N5448) and the JTC1 Request for National Body and Subcommittee Comments on JTC1 N5296, Electronic Commerce Business Team Report (N5437)

2.      serves as input to "Elaboration on the definition of cultural and linguistic adaptability" for the JTC1 Ad-Hoc Meeting of the new  Technical Direction on "Cultural and Linguistic Adaptability and User Interfaces" as per Resolution 22 of the 12th JTC1 Plenary (N5448); and,

3.      contributes to the work of JTC1/SC32/WG2 on ISO/IEC PDTR 15452 "Information Technology - Specification of data value domains".

Contents:

1. INTRODUCTORY NOTES.........................................................................................

2. HORIZONTAL ISSUES...............................................................................................

2.1 Overview................................................................................................................................................................

2.2 Information Technology (IT) -enablement.......................................................................................................

2.3 Localization and multilingualism.....................................................................................................................

2.4 Cross-Sectorial issues.......................................................................................................................................

2.5 Cultural adaptability...........................................................................................................................................

3. REAL WORLD EXAMPLES OF ENCODED VALUE DOMAINS.....................

3.1 Introduction........................................................................................................................................................

3.2 Example #1 - Currency Codes.........................................................................................................................

3.3 Example #2 - Country Codes And Localization With Multilingualism...................................................

3.4 Example #3 - Language Codes And Concordance Among International Standards.............................

3.5 Commodity Codes: IT-Enabled With Localization And Multilingualism................................................

 


1.      INTRODUCTORY NOTES

1.         JTC1 activities of its Business Team on Electronic Commerce (BT-EC) and the Cultural Adaptability Workshop (CAW) both completed their work and reported to the 12th Plenary Meeting of ISO/IEC JTC1, 2-5 June, 1998 in Sendai, Japan.  JTC1 document N5448 contains the resolution of this Plenary.  Resolutions 8, 9 and 10 pertain to JTC1 follow-up on the BT-EC Report and recommendations.  Resolution 22 pertains to JTC1 follow-up on CAW and its recommendations.

            Members of the BT-EC participated in CAW.  The BT-EC scheduled its final meeting to be held after the Workshop on Cultural Adaptability so that the BT-EC could benefit from the results of CAW.

            In Resolution 8 (N5448), JTC1 instructs its secretariat to circulate the BT-EC Report to "National Bodies" and all JTC1 Technical Directions for review and comment.

2.         The purpose of this document is to serve as a "non-technical" summary of work of the ISO/IEC JTC1 Business Team on Electronic Commerce (BT-EC) with respect to "Horizontal Aspects" and "Encodable Value Domains".  {See further below}.  The  BT-EC Report (N5296) contains many recommendations pertaining to "encodable value domains".  Here this document serves as a backgrounder.

3.         This document consolidates in one contribution contents from two existing JTC1 documents; namely:

            (1)        in Chapter 2, the text found in Clause 6.0 of the BT-EC "Report to JTC1: Work on Electronic Commerce Standardization to be initiated" (JTC1 N5296; and,

            (2)        in Chapter 3, which is based on text found in a Canadian member body contribution titled "Additional information in support of the BT-EC Report (JTC1 N5296) - Examples of Encodable Value Domains with IT-Interface Needs, Localization and Multilingualism" (JTC1 N5394).

4.         While directed at JTC1/SC32/WG2 (and WG1) and the JTC1 "Ad-Hoc on Cultural and Linguistic Adaptability".  This contribution is also intended to be circulated outside of JTC1 to raise awareness and obtain feedback on the topics covered here.

5.         Horizontal Issues - Capsule Overview

            In the BT-EC Report, cultural and linguistic adaptability were deemed to be important to electronic commerce.  In addition to being noted as part of consumer requirements {Section 5.2}, they were identified by the BT-EC as key components of four horizontal issues which are of general relevance for all scenarios involving Electronic Commerce.  These issues are:

              Ø       information technology (IT)-enablement;

              Ø       localization including multilingualism;

              Ø       cross-sectorial aspects; and,

              Ø       cultural adaptability.

            The BT-EC ordered these horizontal issues on the basis of:

             (1)       the need to go from the simpler to more complex challenges;

             (2)       placing priority on the  "do-able" and immediately most useful in the context of increasing resource constraints in standardization work; and,

             (3)       promotion and visibility of ISO/IEC JTC1 work within the ISO, IEC and ITU and especially outside of these standardization communities.

            From a user perspective, these four horizontal issues need to be addressed in a harmonized manner.

            From an Electronic Commerce perspective, i.e., that of the JTC1/BT-EC perspective, standardization work addressing the first three horizontal issues associated with:

              Ø       "IT-enablement";

              Ø       "Localization and Multilingualism"; and,

              Ø       "Cross-Sectorial" aspects,

            should resolve many of the requirements for cultural adaptability.  It then remains to be seen what other "cultural adaptability" requirements remain, i.e., in addition to those already identified as "cultural elements" and/or those of a societal nature.


2.      HORIZONTAL ISSUES[1]

2.1       Overview

BT-EC identified four horizontal issues as being of general relevance for all scenarios in­volv­ing Electronic Commerce and gave these horizontal issues some prominent attention in its work. These issues are:

  ·         information technology (IT)-enablement,

  ·         localization including multilingualism,

  ·         cross-sectorial aspects,

  ·         cultural adaptability.

These horizontal issues are ordered here on the basis of

 1.        the need to go from the simpler to more complex challenges,

 2.        placing priority on the "do-able" and immediately most useful in the context of increas­ing resource constraints in standardization work; and,

 3         promotion and visi­bi­lity of ISO/IEC JTC1  work within the ISO, IEC and ITU and espe­cially outside of these stand­ardi­zation communities.

From a user perspective, these four horizontal issues need to be addressed in a harmonized manner.

A key characteristic of commerce world-wide, in particular in the business-to-business and busi­ness-to-administration domains, is that it consists of business trans­actions which:

 1.        are rule-based, i.e., mutually understood and accepted sets of business con­ventions, practices, procedures, etc.; and,

 2.        make extensive use of "codes", often table-based, representing predefined possible choices for common aspects of business transactions. Examples include countries, currencies, languages, manufactures and their products.

Many of these sets of agreed-upon rules used in business world-wide and their associated lists of tables/codes are "de jure" and "de facto" standards. BT-EC noted that numerous inter­na­tional stand­ards are already in use in support of commerce world-wide. The problem is that most are paper-based and lack a computer-processable version. Even if distributed in electronic form, these standards including those of ISO, used in com­merce world-wide consist of tens of printed pages. They cannot be "plugged-in" for use in Electronic Commerce. Much of the intelligence in these inter­national standards is humanly understandable explicitly or implicitly. They have not been described formally using Formal Description Techniques (FDTs), i.e., in their present form they do not sup­port "computational integrity". Con­se­­quently, each enterprise using these code sets has to spend considerable time and effort to (1) determine their meaning and interpret them; (2) build applica­tions; and, (3) hope that they interoperate with other net­works or enterprises.

Human beings like to name "objects". But the approach of using "names" is not very IT friendly, cost-efficient or time-efficient.

Depending on the interplay of multilingual and localization requirements, in Electronic Com­merce, a singular product or service being offered for sale will have multiple names and differing names even in the "same" language. Thus, if we wish to ensure rapid and widespread use of Elec­tronic Commerce globally, we must on the one hand identify "objects", i.e., products or ser­vices being offered for sale, in an unambiguous, linguistically neutral, and IT-processable and EC-facilitated manner, and, on the other hand, present the same via a range of linguistic names (and asso­ci­ated character sets) from a point-of-sale perspective, i.e., human-readable user inter­face, as required by the "local" marketplace.

In order to provide a focus for its work on horizontal issues, the BT-EC utilized four real world examples; namely:

  ·         Currency Codes,

  ·         Country Codes,

  ·         Language Codes,

  ·         Commodity Codes.

(For details of these examples see Chapter 3 below and JTC 1/BT-EC N 047).

These examples represent standards used for commerce world-wide and are present­ly imple­mented by enterprises and their information systems in wide variety of different ways. There are also no "standard" ways for the interworking among these and similar standards. This does not promote global interoperability. The recent widespread use of the Internet is exacerbating existing ambiguities.

 

From a BT-EC perspective, these four examples underline the fact that with respect to elec­tronic commerce there may be less of a need for new standards. Rather the immediate challenge may well be the development of a category of information technology standards which will facilitate the de­velop­ment of information technology enabled versions of existing standards used in commerce and do so in a manner which also supports the interplay of localization and multilingual require­ments, i.e., "bridging standards".

BT-EC wishes to pass on the following considerations for such standardization work in support of Electronic Com­merce; namely:

 1.        Standards must focus on the interface (as opposed to implementation) as the best means of arriving at globally harmonized solutions for interoperability from both a business and information technology perspective.

 2.        Standard interfaces among information systems must be technology neutral accommo­dating advances in technology to the extent possible. Further, such standard interfaces must be linguistically neutral to the furthest extent possible.

 3.        In order to empower users and consumers, standards should be adaptable to local and multilingual requirements at national and regional levels, while ensuring full transparency of avail­able market solutions to the consumer. Multilingualism must be considered. The expansion of open, multilingual standards could significantly increase the volume and value of world-wide Electronic Commerce.

2.2       Information Technology (IT) -enablement

"IT-enablement" is the term used to identify the need to transform currently accepted standards used in commerce world-wide from a manual to a computational perspective. Electronic com­merce, in particular of the Business-to Business or Business-to-Administration categories, intro­duces a requirement for standards that are prepared, structured and made available for unambiguous usage within and among information systems. This requirement can be expressed as "computational integrity", in particular:

"the expression of standards in a form that ensures precise description of behaviour and seman­tics in a manner that allows for automated processing to occur, and the managed evolution of such standards in a way that enables dynamic introduction by the next generation of information systems".

The objective of IT-enablement is to capture in a computer-processable manner, and one which maximizes interoperability, the implicit rules and relations (i.e., those known to "experts") of the code sets found in standards used in commerce world-wide, i.e., capture and state from an entity relationship and/or object technology perspective, using Formal Description Techniques. Also, issues arising from change management in "code tables", i.e., synchro­nization, backwards compatibility, migration, etc. need to be addressed.

IT-enablement is based on the premise that a detailed and exhaustive identification of standards and "conventions", etc., used in support of existing commerce, will eliminate many barriers to Electronic Commerce.

IT-enablement recognizes that within ISO, IEC and ITU, there are committees which have the domain responsibility and expertise in areas of work, the primary purpose of which is to manage and control the content. IT-enablement also recognizes that outside of ISO/IEC/ITU, there are many other organizations which have domain responsibility and expertise in subject areas relevant to global Electronic Commerce. Their "con­tent" and industry sector domain oriented standards require an IT-enabled version for use in Electronic Commerce.

BT-EC suggests that JTC 1 gives proper consideration to IT-enablement, initially focused on currency, country, language and commodity codes. Members of BT-EC are of the opinion that such work will serve as the necessary practical experience and expertise needed to develop a generalized approach to "IT-enablement". This should also help to support localization and multilingual requirements.

(For further information, see document ISO/IEC JTC 1/BT-EC N 46.)

2.3       Localization and multilingualism

IT-enablement is based on the premise that to ensure rapid and widespread use of Electronic Commerce globally, we must on the one hand identify "objects", i.e., products or services being offered for sale, in an unambiguous, linguistically neutral, and IT-processable and EC-facilitated manner, and, on the other hand, present the same via a range of linguistic names (and asso­ciated character sets) from a point-of-sale perspective, i.e., human-readable, as required by the "local" marketplace.

BT-EC reviewed existing JTC 1 terms and definitions of "locale", (see ISO/IEC JTC 1/BT-EC N 46). Those aspects normally are related to the character sets associated with a natural lan­guage, including collating/ordering, data/time formats, monetary formatting, etc., a.k.a. "cultural elements".

From an Electronic Commerce perspective, BT-EC identified four additional sets of parameters of "localization" requirements which should be addressed, namely:

 1.        jurisdictional requirements, i.e., various combinations of "top-down" legal and regulatory frame­works which place constraints on the global marketplace and in doing so, often define/estab­lish a "local" market;

 2.        consumer requirements, i.e., combinations of "bottom-up" consumer demands and behaviour;

 3.        supplier requirements, i.e., combination of factors impacting on suppliers of goods and ser­vices (as well as those involved in supporting logistics chains); and,

 4.        human rights-related requirements, (e.g., disabled/handicapped, privacy, etc.).

BT-EC defines "localization" as:

localization:    pertaining to or concerned with anything that is not global and is bound through specified sets of parameters of:

            (a)        a linguistic nature including natural and special languages and associated multi­lingual requirements;

            (b)        jurisdictional nature, i.e., legal, regulatory, geopolitical, etc.;

            (c)        a sectorial nature, i.e., industry sector, scientific, professional, etc.;

            (d)        a human rights nature, i.e., privacy, disabled/handicapped persons, etc.; and/or

            (e)        consumer behaviour requirements.

Within and among "locales", interoperability and harmonization objectives also apply.

From an Electronic Commerce perspective, "jurisdiction", on the whole, represents a set of local market entry and/or participation requirements which may be of a general nature or product/ser­vice-specific.

From a legal perspective, the basic entity is the country. Two or more countries among them­selves can form a common harmonized "jurisdiction" govern­ing the marketplace, through a bila­teral or multilateral agreement. Where these agreements are of a general nature, the har­mo­nized "jurisdiction" is know as a "region". Examples here include the European Union, NAFTA, etc.. With­in countries, there may be various approaches to more granular legal and regulatory frame­works, e.g., at the level of states, provinces, etc.

In addition to a jurisdiction with a geographic dimension, there are jurisdictions bounded by a goods and services dimension. Examples here include airlines, banking, oil com­panies, etc. Here jurisdiction is often expressed through treaties, regulations, agreements, etc., which are harmo­nized through an entity representing these com­munities (e.g., ICAO, WCO, or WTO).

Combinations of laws and regulations can be viewed as frameworks. BT-EC can thus define juris­diction as:

"jurisdiction:   a distinct legal and regulatory framework which places constraints on the global market­place and in doing so often defines/establishes a local market".

Electronic commerce is "borderless" in its nature - it transcends jurisdictions.

From a BT-EC perspective, multilingual requirements comprise more than just the need to sup­port the character sets and sort/collate sequences of the various languages used by customers world-wide. It also means that a single natural language is utilized in different ways in various local markets.

In addition, one should add the concept of special languages, i.e., those of a scientific or techni­cal nature, as well as those which pertain to a specific industry sector. Many of these can be con­sidered to be global in nature and use.

Thus from an Electronic Commerce perspective, "multilingual" requirements embody not only:

 1.        multiple natural languages; but also,

 2.        multiple and different uses of the "same" natural language;

 3.        multiple source languages in any multilingual thesauri, database, referenceable permitted value domains (PVDs), i.e., tables, code sets, etc.; and possibly also,

 4.        the use of special languages.

In this context, one can define:

multilingualism: "the ability to support not only character sets specific to a language (or family of languages) and associated rules but also localization requirements, i.e., use of a language from jurisdictional, sectorial and consumer marketplace perspectives".

From a BT-EC perspective adding multilingual capabilities in Electronic Commerce can be view­ed as simply mirroring the existing physical world requirements. Prime examples here are product labelling requirements and product usage instructions. Given the increasing globalization in trade in goods, single language usage instructions accompanying products are increasingly rare and multilingual usage instructions increasingly common place.

2.4       Cross-Sectorial issues

Cross-sectorial issues pertain to differing, at times conflicting, understandings of business prac­tices, object identification, etc., among economic sectors. The challenge here is that of resolving two sets of issues:

 1.        Industry sectors, scientific fields, and professional disciplines assign their own uses or mean­ings to the terms of a na­tural lan­guage. Quite often natural languages are used in the manner of what we earlier called "special languages": the same word/term frequently has very dif­ferent mean­ings in other industry sectors. There is a trend in various sec­tors towards using existing non-technical "common language" words as terms with new technical mean­ings. This problem of polysemy needs to be taken into account in cross-sectorial Electronic Commerce.

 2.        Multilingual equivalency needs to create an added layer of complexity and even more so for unambiguous cross-sectorial interoperability in support of Electronic Commerce (as well as world-wide "individual-to-business" Electronic Commerce via the Internet).

A case study on cross-sectorial issues (see JTC 1 /BT-EC N 045) led in respect to scientific lan­guages to the conclusion that a scientific language can be considered a culturally neutral ex­change language which, in turn, has multiple natural language and culturally dependent linguistic equivalent terms.

Technical languages and their use in particular industry sectors, however, do present particular chal­lenges to cultural adaptability and cross-sectorial interoperability since they do not have the attributes of scientific languages. Technical languages as linguistic sub-systems are difficult enough to handle even within their industry sector, in one natural language. To this are added the chal­lenges of localization, multiculturalism and cross-sectorial interactions in Electronic Com­merce.

Each industry sector interacts with other sectors. A key characteristic of special languages is an associated controlled vocabulary of terms, often also in a multilingual manner.

In conclusion, it should be noted that within industry sectors, established standards and conven­tions exist for unambiguous identification and referencing of unique objects, and for naming them (often multilingually), along with associated rules. Although not originally designed to interoperate across and among industry sectors, many of these sectorial standards have core constructs in common which could be utilized to support cross-sectorial Electronic Commerce and in a manner which accommodates localization and multilingual needs.

2.5       Cultural adaptability

BT-EC viewed "cultural adaptability" as a set of requirements affecting global Electronic Com­merce from a cultural perspective and noted that these can co-exist within "localization" and "mul­til­ingualism" requirements. In addition, there are societal aspects which often are not bounded by jurisdiction or geographic area (e.g., Jewish and Muslim cultures transcend jurisdictional boun­da­ries).

The following definition of "cultural adaptability" is found in JTC 1 N4627:

The special characteristics of natural languages and the commonly accepted rules for their use (especially in written form) which are particular to a society or geographic area. Examples are: national characters and associated elements (such as hyphens, dashes, and punctuation marks), correct transformation of characters, dates and measures, sorting and searching rules, coding of national entities (such as country and currency codes), presentation of telephone numbers, and keyboard layouts".

This definition of the concept/term "cultural adaptability" is the same as that for "cultural elements" found in ISO/IEC JTC 1/CAW N 008. It has a focus on special characteristics of natural lan­guages and commonly accepted rules for their use which are particular to a society or geographic area. The emphasis here appears to be on character sets, scripts, glyphs, etc., their ordering, sorting, search, etc.

However, in commerce world-wide, it is not so much the natural language but the usage of spe­cial languages (e.g., technical and scientific), which forms a signi­ficant challenge to providing inter­operability in Electronic Commerce. This is true especially for "technical" uses of natural lan­guages by different industry sectors. Differences in uses of a na­tural language exist also in in­dustry sectors which represent sets of requirements other than those particular to a society or geographic area.

BT-EC made an effort to coordinate the work on this horizontal issue with the JTC 1/CAW (Cul­tural Adaptability Workshop). BT-EC notes Resolution 3 of JTC 1/CAW which states "that CAW did not have time to address the request of JTC 1 to elaborate or amend the definition of cultural adaptability as contained in the document JTC 1 N4627".

From an Electronic Commerce perspective, standardization work addressing the three horizontal issues associated with

  ·         "IT-enablement",

  ·         "Localization and Multilingualism", and

  ·         "Cross-Sectorialization"

should resolve some of the requirements for "cultural adaptability". It then remains to be seen what other "cultural adaptability" requirements remain, i.e., those of a societal nature (see also 5.2.2)"

-------------

[Note:  Section 5.2.2 in the BT-EC Report pertains to "Consumer requirements for Electronic Commerce"].


3.      REAL WORLD EXAMPLES OF ENCODED VALUE DOMAINS

3.1       Introduction

1.         Chapter 3 is based on a Canadian contribution to JTC1, i.e., N5394.  This contribution provided additional and more detailed information in support of Clause 12.3 of the BT-EC Report titled "Examples of Encodable Value Domains" (BT-EC Report, pages 61-66).  The Canadian contribution also provided three exhibits in support of the examples.

2.         The examples are currency codes, country codes, language codes, and commodity codes.  These four real world examples were developed to provide a focus for the BT-EC work on four horizontal issues.  The three exhibits have proved useful in Canada in illustrating and explaining the horizontal issues in a simple and non-technical manner to the business community, policy makers, and various industry sectors.

            The exhibits provided are intended to demonstrate that the identification and referencing of real world objects, i.e., as "instances" of an object class in an "encodable value domain" can be done in a linguistically neutral and unambiguous manner.

            This supports a global approach to Electronic Commerce which is capable of meeting localization and associated multilingual requirements. Linguistically neutral identification and referencing of objects will also support computational integrity and more efficient data interchange, with higher quality assurance and at lower costs for all participants.

3.         Those interested in standardization in areas pertaining to Electronic Commerce may find these exhibits useful in illustrating the horizontal aspects.  They can also use them and augment them by adding their own country and language equivalent(s) terms for the linguistically neutral code(s) in the exhibits.

            The contributions from  BT-EC members with respect to their localization and accompanying linguistic requirements as found in the three exhibits is appreciated.

4.         Finally, it is useful to draw attention to the BT-EC Report (in Clause 6 on pages 21 and 22) which states:

                        "Human beings like to name "objects". But the approach of using "names" is not very IT friendly, cost-efficient or time-efficient.

                        Depending on the interplay of multilingual and localization requirements, in Electronic Com­merce, a singular product or service being offered for sale will have multiple names and differing names even in the "same" language. Thus, if we wish to ensure rapid and widespread use of Elec­tronic Commerce globally, we must on the one hand identify "objects", i.e., products or ser­vices being offered for sale, in an unambiguous, linguistically neutral, and IT-processable and EC-facilitated manner, and, on the other hand, present the same via a range of linguistic names (and asso­ci­ated character sets) from a point-of-sale perspective, i.e., human-readable user inter­face, as required by the "local" marketplace."

            In support of this BT-EC text, Canada draws attention to ISO 1087 which defines "name:  designation of an object by a linguistic expression".

            Consequently, any "object" will have (1) multiple names; and, (2) in global Electronic Commerce, many of the "names" used to designate the "object" being traded will be in the form of linguistic expressions which use non-Latin 1 Characters, (e.g., Arabic, Chinese, Thai, Hebrew, Japanese, etc.).  This is one reason why ISO/IEC 10646 (a.k.a. "Unicode") will be a key IT infrastructure standard needed to support global electronic commerce.

3.2       Example #1 - Currency Codes

A key attribute of electronic commerce is that it involves business transaction where payment must be made in a mutually acceptable currency.  ISO 4217 is the standard for codes representing currencies and funds.  This standard and its contents are the responsibility of ISO TC 68 Banking.  The principles for inclusion in the code lists of ISO 4217 is that (1) they must be/represent currencies and funds used within the entities described by ISO 3166 (Country Codes); and, (2) the codes listed are intended to reflect current status, at the date of publication.

ISO 4217 has a number of features and anomalies which although human understandable need to be identified and explicitly captured in an IT-enabled manner.  In short, ISO 4217 includes objects which are not currencies (or funds).  In ISO 4217, there are countries, i.e., as ISO 3166 entities, where:

  Ø       the three digit country code is not the same as the three digit ISO 4217 3-digit code, (e.g., due to the creation/utilization in ISO 4217 of ISO 3166 "User Extensions").  For example, one can readily identify in ISO 4217 twenty-five (25) instances for ISO 3166 entries where the ISO 3166 Country Codes 3-digit numeric differs from the ISO 4217 "Code Name" 3-digit numeric.  Nor is there any relation between the ISO 3166 and ISO 4217 alpha codes for many countries.

  Ø       a country (or dependency) has no currency of its own and utilizes the currency of another country;

  Ø       a country has more than one currency, i.e., its own and that of another country;

  Ø       countries having both a currency code and a funds code;

  Ø       a set of countries collectively sharing and using a currency which has no "issuing country", (e.g., SDR, XDR, XOF, and XAF).  Here one notes the need to add the "euro" as currency (in addition to the "ecu", i.e.,  XEU);

  Ø       special fund types;

  Ø       "currency" not linked to any country or organization, (e.g., precious metals such as gold - 959, alpha = XAU, special settlement currencies, etc.); and,

  Ø       "currencies" having no numeric code but only a 3-alpha code, (e.g., XFO = Gold Franc).

Some of the above noted rules and relationships are stated in ISO 4217, others are implicit (and known by "experts").   An IT-enabled version of ISO 4217 is required especially now that in electronic commerce, and particularly that which is Internet-based.  Many suppliers and consumers entering the electronic commerce market or other Internet-based activities are not aware of the "peculiarities" of ISO 4217, particularly those outside the financial community.

Experiences in the financial services/banking sector indicate that on the Internet those engaged in electronic commerce as well as in general applications, need to be made aware of standard notation for currencies.  For example, in actual e-com practices, the Canadian dollar is being represented as "CDN", "CAN", "CA", etc.  Further, the 3 alpha codes of ISO 3166-1 for countries often are confused with the ISO 4217 3-alpha currency code.

3.3       Example #2 - Country Codes And Localization With Multilingualism

Several international standards are used internationally for codes representing countries.  The better known ones are ISO 3166-1, the USMARC Code List for Countries as maintained by the Library of Congress (LC), and the Universal Decimal Classification (UDC) auxiliary table for countries.  Of these the ISO 3166-1 is the most widely known.  {On the LC and UDC, see further Chapter 3.4 below}

This example focuses on ISO 3166-1.  This standard and its contents is the responsibility of ISO TC 46 - Information and documentation.  The purpose here is to highlight the need for an IT-enabled version of this standard, and also bring to the fore related localization and multilingual aspects.  The title of ISO 3166 is "Codes for the representation of names of countries and their subdivisions".  Within ISO 3166 standard, there are now three parts; namely:

  Ø       Part 1: Country Codes;

  Ø       Part 2: Country Subdivision codes; and,

  Ø       Part 3: Codes for formerly used names of countries.

Here ISO 3166-1 "established codes that represent the names of countries, dependencies, and other areas of particular geopolitical interest, on the basis of lists of country names obtained from the United Nations".  Currently, each entry (or "record" of a permitted instance) contains:

 (1)       a three-digit numeric code

 (2)       a two letter alpha code

 (3)       a three letter alpha code

 (4)       a short name -  English

 (5)       a long, i.e., formal name - English

 (6)       a short name - French

 (7)       a long, i.e., formal - French

ISO 3166 also has a note field in English and in French.

ISO 3166-1 thus has seven (7) "standardized" representations for each unique entity or object, three (3) of which are codes.  ISO 3166-1 allows any one of the seven to be utilized although in practice, and especially in IT systems one usually utilizes one of the three codes.

For this ISO 3166 standard, we currently do not have a common international default "standard" for the interface among applications/information systems engaged in support of electronic commerce.  The 3-digit numeric code, the 2-alpha code and the 3-alpha code are all used in interchanges.

Of these three codes, the three digit numeric code is the most stable and tends to change only when the physical boundaries change.  Names short and long do change and at times the accompanying two and three-letter alpha codes as well.

The ISO 3166 Alpha-2 and Alpha-3 codes are not that stable, i.e., whenever a country changes its name it often also changes, its alpha-2 and alpha-3 codes, (e.g., Burma to Myanmar, Zaire to the Democratic Republic of the Congo, etc.).  The 3-digit numeric code is much more stable.  On the whole, it changes only when the actual physical boundaries of the countries change, i.e., the entity being identified and referenced is no longer the same.  For example, the alphabetic  (written language) equivalents to "3166:180" recently under went the following changes:

 

            Former

            New

Alpha-2

ZR

CD

Alpha-3

ZAR

COD

Short Name (en)

Zaire

Congo, Democratic Republic of the

Long Name (en)

Republic of Zaire

the Democratic Republic of Congo

Short Name (fr)

Zaïre

Congo, la République démocratique du

Long Name (fr)

République de Zaïre

La République démocratique du Congo

 

The use of Alpha-3 code tags for ISO 3166 country name causes overlap confusion with ISO 4217 currency and funds codes which are represented as Alpha-3 codes (upper case).

Further, the 3-digit numeric code is linguistically neutral and unambiguous.  Each of the 3-digit numeric codes has in ISO 3166 associated with it six (6) alphabetic linguistic expressions, two of which also serve as "human understandable" (and computer-processable codes).

From an interoperability perspective, i.e., both that of commerce and IT, the "3166" identifying the scheme and the rule set and the 3-digit numeric code identifying a country, in the context of this domain, together form an unique and an unambiguous global identifier for the entity being referenced.  The alpha codes and names should simply be considered linguistic equivalent expressions, i.e., from an information systems perspective all that one may need to standardize at the interface is something akin to "3166:246", "3166:056", "3166:792", etc.

In addition, and also very important, it should be noted that ISO 3166-1 contains many instances and associated codes for entities which are not "countries", i.e., they are dependencies of other entities, (e.g., France, Great Britain, USA, etc.).  Human beings "filter" and easily make these distinctions.  Computers being very dumb cannot unless explicitly instructed to make things even worse many of these 3166-1 "sub-entities" have one code in 3166-1 and a different code in 3166-2.  [Note:  One would have thought that when ISO 3166 was split into its three parts, all the sub-entities in ISO 3166 would no longer be found in 3166-1 but be moved to 3166-2].

Further, we should note the fact that in their own locale and language, countries have their own "local" short and long (or formal) name.  For example 528 Netherlands = "Nederland" and "Koninkrijk der Nederlanden".  Further, countries which are bilingual have two sets of local short and long/formal names (e.g., 058 Belgium = "Belgie" and "Koninkrijk van Belgie", and "Belgique" and "Le Royaume de Belgique".  There are also multilingual countries, (e.g., Switzerland).  Exhibit 3.3 provides an illustrative example.

Then, there is the fact that many countries use non Latin Alphabet-based character sets.   This means that one also has the original country language character script as "alphas" plus their "latinized" equivalents.

To this is added the fact that from the perspective of each country and language, the "other" countries have (are known by) their "own names".  For example, a person in France uses "Allemagne" not "Germany" or "Deutschland" as the linguistic equivalent for "3166:280".  All this is common, non-competitive information.

It suffices to state that all the rules and intelligence implicit in ISO 3166-1 (as well as 3166-2 and 3166-3) have not yet been captured explicitly in an IT-enabled and EC-facilitated manner, (e.g., as a "normalized" (callable) database).

This section concludes, with Exhibit 3.3 where, in matrix form, we present in the left column titled "IT-Needs", i.e., the "interface" requirement for ISO 3166-1 among information systems.  In the right-side columns are presented some examples of the multiple linguistic equivalents required to support "Localization and Multilingual" requirements for particular implementations and supporting IT applications which in turn may have human readable User Interfaces.

Notes on  Exhibit 3.3

[1]        Normally the eight (8) "fields" under "Localization and multiculturalism" would be separate (sets of) "columns" in a database schema all forming part of the "row".  It is noted that the physical presentation here in Exhibit 3.3 does not reflect this.

[2]        The 2-letter language codes, (e.g., en, fi, fr, nl, sv, tr), are taken from ISO 639.

[3]        The "->" entries are not part of ISO 3166-1.  Although only the Latin alphabet character set is utilized in this and the other two examples, it is understood that non-Latin alphabet-based character sets are also used or will be used in electronic commerce.  As such this is an "illustrative" example.

 


EXHIBIT 3.3  -           TOWARDS AN IT-ENABLED STANDARD FOR ISO 3166 - COUNTRY CODES

 

            IT-Needs (Interface)

            Localization and Multilingual Requirements  [1]

3166:246

Alpha-2: FI

Alpha-3: FIN

 

Short Name (en) [2]:    Finland

Long Name (en):                      Republic of Finland

 

Short Name (fr):                       Finlande

Long Name (fr):                        République de Finlande

             [3]  ->

Local Short Name (fi):  Suomi

Local Long Name (fi):   Suomen tasavalta

            ->

Local Short Name (sv): Finland

Local Long Name (sv): Republiken av Finland

3166:056

Alpha-2: BE

Alpha-3:  BEL

 

Short Name (en):                      Belgium

Long Name (en):                      Kingdom of Belgium

 

Short Name (fr):                       Belgique

Long Name (fr):                        Royaume de Belgique

            ->

Local Short Name (nl): Belgie

Local Long Name (nl):  Koninkrijk van Belgie

            ->

Local Short Name (fr):  Belgique

Local Long Name (fr):  Royaume de Belgique

3166:792

Alpha-2: TR

Alpha-3:  TUR

 

Short Name (en):                      Turkey

Long Name (en):                      Republic of Turkey

 

Short Name (fr):                       Turquie

Long Name (fr):                        République turque

            ->

Local Short Name (tr):  Turkiye

Local Long Name (tr):  Turkiye Cumhuriyeti


3.4       Example #3 - Language Codes And Concordance Among International Standards

At times several different standards are used internationally for the same domain.  One such domain is that of codes representing languages.  With respect to sets of codes representing "languages" (and "countries"), these provide examples where the ISO is not the only organization to issue and maintain standards used world-wide even though its standard is the most widespread used and known, i.e., ISO 639, "Codes representing the names of languages".  This standard and its contents is the responsibility of ISO TC37 - Terminology (principles and coordination).

Another international standard providing a coding schema for country codes and language codes is that of the US Library of Congress.  Its primary application is in the bibliographic/information sciences domain.  It should be noted that these coding schemas pre-date those of the ISO.  For country codes the Library of Congress uses two or three character lower case alphabetic codes.  These represent existing national entities, provinces and territories of Canada, states of the United States, divisions of the United Kingdom, and internationally recognized dependencies.  It is known as the USMARC[2] Code List for Countries and is maintained by the Library of Congress.  Similarly the Library of Congress maintains a USMARC Code List for Languages.  This code list consists of three letter mnemonics representing only written languages of the modern and ancient world.  "Where one spoken language is written in two different sets of characters, each set of characters is assigned a specific code.  For example, Serbian and Croatian are the same spoken language but the former is written in the Cyrillic alphabet and the latter in the Roman alphabet" ("Roman" known within ISO as the "Latin" character set).

A third international standard, the Universal Decimal Classification (UDC) scheme also has language codes.  The UDC is used in the bibliographic/information science work (primarily in Europe), as well as increasingly used there for classifying documents on the Internet.

Human beings can recognize and filter these differences, computers cannot unless explicitly instructed.  Keeping in mind that the scope and definitions of these different coding schemes also differ for what are generally the same business needs, one can bridge such differences through construction of concordance tables.  This allows one to maximize, insofar possible, interoperability across differing sectorial perspective as well as identifying "non-interoperability" instances.

In ISO 639 each entry (or permitted instance) consists of:

  Ø       a language symbol, in the form of a two-letter code;

  Ø       the language name - English

  Ø       the language name - French

  Ø       the original language name (as written in the Latin-1 alphabet).

ISO 639 also has a note field in English and in French.

With respect to ISO 639, two initial observations must be made.  The first is that Canada (and the United States) has not adopted ISO 639 as a "national standard" due primarily to its current lack of inclusion of North American aboriginal and native languages.  Secondly, the LANG attribute is important in SGML (ISO/IEC 8879).  for example, in the proposed New Work Item (ISO/IEC JTC1 N4742) for "Standard HTML", the LANG attribute

            "identifies a natural language spoken, sung, written or otherwise used by human beings for communication between people.  Computer languages are explicitly excluded.  The value of the LANG attribute is referred to as the "language tag"...  The name space of language tags is administered by IANA.  Example tags include: en, en-US, en-cockney, i-cherokee and x-pig-latin.

            Two letter primary tags are reserved for ISO 639 language abbreviations.  This Committee Draft does not specify three-letter primary tags, however their description may be found in the "Ethnologue" {Gri92}.  Any two-letter initial sub-tag is an ISO 3166 country name..."

Serious reflection and more systematic thinking is required here with respect to "tags" especially if one wishes to use SGML ® HTML ® XML generally and in electronic commerce specifically as well as ensuring interoperability not only with the use of other syntaxes but among various consumer markets, industry sectors, etc.

First of all, "i" and "x" are single characters; and they do not exist in ISO 639.  Secondly, "cherokee" and "pig-latin" are not ISO 639 languages.  Thirdly, for "en-us", it is not clear at all, given the other examples whether this represents English language as used in the United States or something else.

Fourthly, use of Alpha-2 code tags for ISO 3166 country name is confusing vis-à-vis ISO 639 language codes.  They overlap and are not mutually exclusive.  This at times is confusing for humans (and even more so for "dumb" computers).  Fifthly, in many sort algorithms and search/retrieval engines, upper and lower case letters are treated the same.  This causes even more confusion in IT-enabled processing of these code sets if two letter alphas are used as codes for both countries and languages.

Finally, there is an urgent need to update ISO 639 to include North American aboriginal and native languages as well as providing for a systematic means for handling and registering user extensions, (e.g., "cockney", "pig latin", "klingon", etc.).  Alternatively, one could consider developing an "ISO 639 Level 2" standard for codes representing user extensions of the nature noted above, as well as "historical languages", , i.e., as is being developed for ISO 3166-3. 

Even more important is the need to develop a systematic and unambiguous interworking in an IT-enabled manner among language code (ISO 639), currency and fund codes (ISO 4217) and country codes (ISO 3166-1).

One should also develop mechanisms for the interchange of the "same" data content from a cross-industry sector perspective but using different code sets in the same domain.  To assist in progressing work in this area, chapter concludes with an Exhibit 3.4 which consists of a sample concordance table (English language version only) for ISO 3166 (Country  Codes) + ISO 639 code set on the one hand, and on the other, the equivalent Library of Congress (LC) country and language code set and the UDC language code set.

Notes on Exhibit 3.4

 [1]       As provided by Féderation internationale d'information et de documentation (FID) based on documentation prepared in 1994 (and verified with them early 1998).  The UDC also  has "country codes", i.e., "place codes", as a "common" auxiliary table, but this has not been included in Exhibit 3.4.

[2]        For human representation, we have included the "Short Name - English" as the linguistic equivalent for the ISO 3166-1 3-digit numeric code.

 [3]       For human representation, we have also included the ISO 639 English name of the language.  There is also the French name and of course the actual "name" of the language in the language itself.  ISO 639 captures this "Original" name in its Latin alphabet equivalent version.

 [4]       One notes that the LC alpha country codes are not the same as ISO 3166-1 alpha codes for the same entities.

 [5]       Added here to indicate that in Canada under the Nunavut Act, a new "territory" will be established 1 April, 1999 from the existing Northwest Territories, i.e., "Nunavut".  In Nunavut, in addition to English and French, Inuktitut will become a recognized "official" language.  The language code "ik" is the one that has been reserved for Inuktitut.

 [6]       In ISO 639 the "ik" represents "Inupiak" which is grouped/classified as an "Eskimo language".   In UDC, the language code 562 refers to "Inuit".  There is no code for "Inuktitut" per se.

 [7]       The LC codes place "Inuktitut" under the Eskimo family of languages.

 [8]       One notes that all LC language codes are not the same as ISO 639.  At times even the first letter is not the same, (e.g., "nl" versus "dut").


EXHIBIT 3.4 ¾        SAMPLE CONCORDANCE OF STANDARDS FOR COUNTRY AND LANGUAGE CODES: ISO, LC AND UDC

 

            ISO

            Library of Congress

            UDC [1]

            3166-1

            639

 

 

 

Numeric Code

Short Name (E) [2]

Applicable Languages (E) [3]

Applicable Language Codes

Country  Codes

Language Codes

Language Codes

3166-1:124

Canada

English

en

xxc [4]

eng

= 111

 

 

French

fr

 

fre

= 133.1

 

 

Inuktitut [5]

ik [6]

 

esk [7]

= 562

3166-1:056

Belgium

French

fr

be

fre

= 133.1

 

 

Dutch

nl [8]

 

dut

= 112.5

3166-1:246

Finland

Finnish

fi

fi

fin

= 511.111

 

 

Swedish

sv

sw

swe

= 113.6

3166-1:792

Turkey

Turkish

tu

tu

tur

= 512.164

3166-1:840

United States

English

en

xxu

eng

= 111

3166-1:826

United Kingdom

English

en

xxk

eng

= 111

 

 

Scots Gaelic

gd

 

gae

= 152

 

 

Welsh

cy

 

wel

= 153.1

 


3.5       Commodity Codes: IT-Enabled With Localization And Multilingualism

In combining multilingualism and localization requirements, one must recognize the fact that associated with use of a language, (e.g., English, French German, Spanish, Portuguese, etc.), there are various "local" uses of the same natural language.  The same object may well be and is often used and known by different terms in the same language in different local usage conventions.  The Universal Product Code (UPC) and European Article Numbers (EAN) systems recognize this as they have multilingual terms associated with each code for "local" packages/labelling purposes.  This has implications for Electronic Commerce and particularly that via the Internet.  For example, English as a language is in use in many countries or "locales", (e.g., Australia, Britain, Canada, India, Ireland, Jamaica, New Zealand, USA, etc.), but has different local uses in each.  Similar examples exist for other languages.

In this context, the BT-EC took the example of an enterprise wishing to sell potatoes world-wide.  This is a simple example yet representative of the interplay of the four horizontal issues.  This means that these goods have to pass through customs for export/import into various countries.  The custom authorities world-wide have an organization that sets common rules and procedures, i.e., the World Customs Organization (WCO), formerly the Cooperative Customs Council (CCC)[3].  The WCO has established a classification scheme for goods traded called the Harmonized System (HS).  It was formerly known as the Brussels Tariff Nomenclature (BTN).  As such the HS for "commodity codes" is an internationally recognized standard although of a non-ISO/IEC/ITU origin.

Within the Harmonized System (HS) of the WCO, the general code for potato (fresh or chilled) is "0701".  This linguistically neutral code "0701" is a data item or data element instance in the HS permitted value domain.  Here  the German German equivalent name of "potato" is "kartoffel", but the Austrian German equivalent is "Erdapfel".  Similarly, the Spanish Spanish equivalent name is "patata" while the Mexican Spanish equivalent name is "papa", and the Dutch equivalent is "aardappel", etc.  In French, the dictionary term is  "pomme de terre", with "patate" as a "local" specific, i.e., Canada/Quebec term (and one which is not slang).  The equivalent names noted above are thus culturally adapted equivalent linguistic expressions associated with "0701".  Depending on the "locale," the appropriate human oriented names or linguistic expressions can be systematically/automatically generated from the linguistically neutral numeric code for human understanding, product labelling, reporting, filing, etc., and, where required, in multiple languages.

From a more detailed analysis, one can conclude two key aspects of the interworking of "localization", "cultural", and "multilingual" requirements; namely:

(1)        that within a jurisdiction, (e.g., a country, a province, canton, etc.), there can be more than one natural language of use; and,

(2)        that localization needs can result in a product, i.e., entity or object, having more than one equivalent "name" within a particular natural language.

In Exhibit 3.5, we present the "potato" from an IT-enabled and EC-facilitated perspective.  Again on the left-hand side of the matrix, under "IT-Needs (Interface)" we identify the schema ID, i.e, "HS" along with the permitted value, i.e., in this case 0701 for potato.  In the middle column, we present examples of countries that import potatoes, i.e., using their ISO 3166 country code and short name (English), while on the right-hand side, we present linguistic equivalents required to support local and multilingual requirements from both a jurisdictional and consumer perspective.

Notes on Exhibit 3.5

[1]        Exhibit 3.5 focuses on human understandable representation of what should be an IT-enabled global standard for trade in goods based on the existing Harmonized System (HS) of the World Customs Organization (WCO).  The example here is "potato", i.e., fresh or chilled potatoes, where, under the HS, "0701" is the primary code, and ".01" is for seed potato, while ".09" is for "other potatoes".  For the purpose of this example we use "0701".  There are additional codes for potatoes which are "frozen", i.e., 0710.10, "cut/sliced/broken or powder", i.e., 0712.10, etc.  Each of these will have their own local/linguistic equivalents.   Further, there are "sweet potatoes" which could provide an even richer example.

            Finally, it is understood that there are other set of codes, i.e., value domains, where "potato" as an instance is identified and referenced with a different code.  For example, in the domains of agriculture, pesticides, retail/food stores, etc.

            In classification and coding schemas utilized in these other sectors, "potato" has a different code.  This is understandable since the goal of the business context in which they are used is quite different from that of customs authorities.

 [2]       The country code and short name are taken form ISO 3166-1.

 [3]       The 2-letter language codes, (e.g., de, en, es, fi, fr, ik, nl, sv), are taken from ISO 639.

 [4]       In 1999, Nunavut will become a new territory with Inuktitut as an added "official" language to English and French.  In Inuktitut "potato" is "patiti" (transliterated Latin character set equivalent) to the in Inuktitut language character string used to designate "potato"


EXHIBIT  3.5 - COMMODITY CODE EXAMPLE "POTATO" [1]

 

            IT-Needs (Interface)

            Country Code  - Short Name (en) [2]

            Localization and Multilingual Needs [3]

HS: 0701

124      CANADA

(en):     potato

(fr):       pomme de terre

(ik):      patiti [4]

 

464      MEXICO

(es):      papa

 

724      SPAIN

(es):      patata

 

040      AUSTRIA

(de):     erdapfel

 

276      GERMANY

(de):     kartoffel

 

056      BELGIUM

(fr):       pomme de terre

(nl):      aardappel

 

246      FINLAND

(fi):       peruna

(sv):      potatis

 



    [1]Apart from some minor editing changes, (e.g., renumbering, spelling, typos, etc.), Chapter 2 is a verbatim extract of Clause 6 of the BT-EC Report to JTC1, i.e., JTC1 N5296, pages 22-27.

    [2]The acronym "MARC" stands for "Machine Readable Cataloguing".  The preceding characters represent the country who utilize the "MARC" format, have amended it for their specific cataloguing needs, and have an infrastructure at the national level for addressing these national needs.  There are primarily 3 countries namely the US, Canada, and the UK  Hence the designation of, USMARC, CANMARC, UKMARC.

    [3]The WCO is but one example of "coordinated autonomy" among autonomous organizations.  The degree to which autonomous organizations achieve interoperability from a business operational perspective sets the limit to the extent of interoperability of supporting IT-based functional services.