------------------------------------------------------------

                                       ISO/TC37/SC2      N 241
                                       ISO/TC37/SC2/WG1  N  74 (R)

Title:          Draft technical report: Language codes part 3:
                Guide to the alpha-numeric coding of the world's
                languages
Date:           2001-07-27
Source:         John Clews (UK)
Status:         Personal contribution
Action:         For information
Distribution:   ISO/TC37/SC2


------------------------------------------------------------
Draft technical report: Language codes part 3: Guide to the
alpha-numeric coding of the world's languages

This is a document that could (a) serve towards being either a technical
report which documents different language codes now in widespread
use, and (b) - if deemed appropriate by ISO/TC37/SC2 - could also be
developed further as a list of single language codes which could be used
to extend the current repertoire of language codes in the existing
parts of ISO 639.

Direction for further development will be taken from ISO/TC37/SC2.

It starts from the premise that individual codes from various
existing coding systems are already being used together, and outlines
problems to avoid in doing this.

General notes:

(1)     ISO 639-1 and ISO 639-2 form parts 1 and 2 in relation to
        this part 3

(2)     The structure of this draft technical report currently mirrors
        the structure of ISO 639-1.

(3)     As an HTML file, this document is best viewed at a medium or
        lower resolution, and the table in section 5 viewed at a
        small or smaller resolution, to avoid text disappearing at
        the right hand margin.

------------------------------------------------------------

1   Scope
2   Normative references
3   Terms and definitions
4   Comparison of language codes
4.1  Structure of the language code table in Language codes part 3
4.2  Maintenance of the language code table in Language codes part 3
4.3  Combining language codes with other codes
5   Table of language names and language codes

Annex A (normative)   Procedures for the Registration Authority
                      for Language codes part 3 [to be added]

Annex B (informative) Bibliography [to be added]

Annex C (informative) Area codes used (expanded from
                      United Nations Statistical Office areas)

------------------------------------------------------------

Foreword        [similar in all ISO standards]

ISO (the International Organization for Standardization) and IEC (the
International Electrotechnical Commission) together form a system for
worldwide standardization. National bodies that are members of ISO or
IEC participate in the development of International Standards
through technical committees established by the re-spective
organization to deal with specific technical fields. Other
governmental and non-governmental international organizations with
liaison to IEC and ISO also take part in the work.

International Standards are drafted in accordance with the rules
given in the ISO/IEC Directives, Part 3.

This draft technical report was prepared by Working Group 1 "Language
codes" of Subcommittee 2 "Layout of vocabularies" of the Technical
Committees of ISO TC 37 "Terminology (principles and coordination)".

Draft International Standards and technical rports are circulated to
national bodies for approval before their acceptance as
Inter-national Standards. They are approved in accordance with
procedures requiring at least 75 % approval by the national bodies
voting.


------------------------------------------------------------

Introduction


This draft technical report has been prepared taking into account the
aims and needs expressed in the document ISO/TC37/SC2/WG1 N69: Coding
systems, prepared on 2001-01-31 by Håvard Hjulstad (convenor of
ISO/TC37/SC2/WG1) in Norway.

This introduction further discusses aims and needs in this area.


(a) Relationship between different parts of ISO 639

Language codes part 3 provides additional information and
extends the capability and functionality of
ISO 639-1 (2-letter codes, intended for use in terminology) and
ISO 639-2 (3-letter codes, intended for use in bibliography).

To achieve this aim, Language codes part 3 has been developed to
combine ISO 639-1 and ISO 639-2 with information from the
Linguasphere register of the world's languages and speech
communities, and additionally provides information used in other
coding systems, covering a wider range of languages than ISO 639-1
and ISO 639-2 provide at present.

Language codes part 3 does not replace ISO 639-1 and ISO 639-2.
Indeed their codes are preserved intact, and documented together for
use with implementations such as RFC 3066 "Language Tags" (or
Internet use, superseding RFC 1766 "Language Tags").

Language codes part 3 is useful in this regard in that it provides a
single list of language codes for use with RFC 3066, avoiding the
need to consult ISO 639-1 and ISO 639-2 separately.


(b) Background to work in ISO/TC37/SC2 and ISO/TC46/SC4

ISO 639-1 was devised primarily for use in terminology, lexicography
and linguistics, and contains simple 2-letter codes for variant
languages.

ISO 639-2 was devised primarily for use in library systems.
It provides codes for
(i)   all languages contained in ISO 639-1;
(ii)  other languages not contained in ISO 639-1;
(iii) older languages not covered in ISO 639-1; and in addition
(iv)  generic language groups.

However, many users besides terminologists and librarians now use
language codes, particularly in ICT systems, and there is a need for
a generic language coding system which provides codes for many more
languages.

If an extended, generic, coding system is not developed by ISO for
more generic use, users will be (and already are are) devising their
own, to meet immediate needs, with the result that such extensions
prevent information interchange on a larger scale.

ISO 639-1 is limited to 2-letter codes, and thus codes can only be
provided for 26x26 languages. Many linguists, and others, (including
some international organizations) have in fact used other codes,
particularly those developed by the Summer Institute of Linguistics
(SIL) in their Ethnologue publications. It has also been agreed that
ISO 639-1 should have no more codes added after an agreed point, to
avoid clashes between ISO 639-1 and ISO 639-2 in specifications such
as RFC 3066 (Language Tags).

ISO 639-2 could provide codes for 26x26x26 languages. However, ISO
639-2 also has its own limits specified in the standard: unless a
substantial body of literature (50 different items in 5 bibliographic
agencies) is documented, codes are not allocated, even for languages
with official status within countries or regions of countries, which
can adversely affect some aspects of computer development for certain
languages. This is rigorously applied, as set out in the standard,
and adverse effects for generic uses (particularly in ICT systems)
are already in evidence.


(c) More generic needs

In ICT applications, there has been a tendency to use codes from
ISO 639-1 and ISO 639-2, where possible, though at times there has
been some frustration that less codes were available than ICT users
required. Some ICT users have also used their own coding systems,
notably the OpenType specifications (OT) used in font and rendering
technologies, and others have used the SIL codes.


(d) Variant codes

Because of the again relatively small number of codes allocated,
variant codes were developed in various countries, including the UK,
Sweden and Germany, which in some cases have caused clashes in
bibliographic information interchange.


(e) Combining codes

Some users also have the tendency to combine language codes with
other codes (such as script codes, country codes, etc) and it may be
that ISO/TC37 can provide guidance on best practice in combining such
codes in order to avoid clashes arsing from different approaches.


(f) The way forward

As the originator of the original language coding standard, and with
a significant involvement in various aspects of ICT development,
members of ISO/TC37 are in a good position to develop a further
standard for more generic needs.

It is intended that ISO/TC37/SC2/WG1 should be in contact with
other user communities represented by ISO, particularly with SCs of
ISO/IEC JTC1, in developing this work.

There are also user communities who are not always directly involved
with ISO who could make use of extended language codes. For instance,
in the early stages of investigating this in the UK, it is clear that
various national and international government agencies, in Europe and
North America, have the need for a large range of codes for
statistical purposes.

There is an urgency in this because
(i)   language codes are now in very widespread use in information
      systems, and used in very large numbers in information
      interchange;
(ii)  there are several different international, national, and de facto
      standards, each of which includes codes which clash with codes in
      other coding systems.
(iii) the international standards involved (ISO 639 and ISO 639-2)
      have limits on the number of codes that can be applied, and users
      are developing their own extensions, incompatible with each other
      and with any part of ISO 639.

Unless the provenance of the coding sytem used is always documented
with each information interchange, which is not really feasible, the
use of wrong coding, and erroneous data, is extremely likely.

Most of the alternative coding systems use a 3-letter code, which
makes it difficult to be sure that users are interchanging the same
codes with the same meanings, because while some code elements are
the same in each system, many are not, and no documentation is
available which provides information on all of this.

This draft technical report documents alternative practices, but does
not limit the use of alternative practices, and also aims to provide
guidance on optimum ways to use language codes to avoid problems.



------------------------------------------------------------

1 Scope

Language codes part 3 lists language codes used in ISO 639-1 and ISO
639-2, and also provides information on additional language codes
used in other coding systems. This is provided in a detailed table.

It plans to provide information on which language codes from other
coding systems are safe to use in addition to codes from ISO 639-1
and ISO 639-2, and guidelines on avoiding problems.

There is the potential to develop a further full standard (a notional
ISO 639-3) which would provide a much-extended list of language
codes, in comparison to that currently available, to meet user needs.
However, the initial aims is to provide documentation, and that is
the principle aim of this draft technical report.

The structure of Language codes part 3 mirrors that of ISO 639-1.

------------------------------------------------------------

2 Normative references

The following standard contains provisions which, through reference
in this text, constitute provisions of this International Standard.
At the time of publication, the edition indicated was valid. All
standards are subject to revision, and parties to agreements based on
this International Standard apply the most recent editions of the
standards indicated below. Members of IEC and ISO maintain registers
of currently valid International Standards.

ISO/DIS 1087-1:1997, Terminology work - Vocabulary - Part 1: Theory
and application.

ISO 3166-1:1997, Codes for the representation of names of countries
and their subdivisions - Part 1: Country codes.

ISO 3166-2:1998, Codes for the representation of names of countries
and their subdivisions - Part 2: Country subdivision code.

ISO 3166-3:1999, Codes for the representation of names of countries
and their subdivisions - Part 3: Code for formerly used names of
countries.

ISO/DIS 5127-1:1996, Information and documentation - Vocabulary -
Basic and framework terms.

------------------------------------------------------------

3 Terms and definitions

For the purpose of this part of ISO 639 the following definitions apply:

3.1 Coding system

data transformed or represented in different forms according to a
pre-established set of rules (ISO/DIS 5127-1:1996) 3.1

3.2 Language code

2-letter or 3-letter code representing a language. Both existing ISO
codes, and other codes and scales are documented here.

NOTE   To save space in tables, the following conventions are used:

I-2    2-letter codes from ISO 639 and ISO 639-1, and new codes
       applied by the ISO 639 Maintenance Agency;
I-3T   3-letter codes from ISO 639-2, and new codes
       applied by the ISO 639-2 Maintenance Agency;
I-3B   3-letter bibliograhic codes from ISO 639-2, and national
       variants of these codes used in libraries.
OT     3-letter OpenType language tags, developed by Adobe and
       Microsoft, widely used in the IT industry;
SIL    3-letter codes from the Ethnologue, published by the
       Summer Institute of Linguistics (SIL);


3.3 Language name

Word(s) identifying the language

NOTE   Given various names in use, various conventions will be used
       to simplify the use of cross-references and alternative names.
       See Key to section 5 for details.


3.4 Script code

Code element from ISO DIS 15924: Codes for representation of names of
scripts. This may be used in later versions of the table in this
technical report. Currently codes from ISO DIS 15924 are not used in
this table.


3.5 Linguascale

Classification system enabling languages to be grouped by proximity
of relationship. Documented in the Linguasphere Register of the
World's Languages and Speech Communities (see Bibliography).

NOTE    This is NOT a standard or normative code, but is used to enable
        sorting of the table, so that related languages can appear
        close together.

        It will allow more detailed comparison of the various
        generic codes included in ISO 639-2, and will also assist
        decisions as to whether linguistic entities may be regarded
        as languages, dialects, or more generic language groupings.


3.6 Geographic codes


(a) Country code

Code from ISO 3166: Codes for representation of names of countries.
A country code may be used in later versions of the table in this
technical report, to identify place(s) of use of a language.

NOTE   For ease of use, the present table uses country names rather
       than country code. This is also to avoid confusion between the
       specification of a language in a particular country, and
       locale IDs used in programming language environments, both of
       which often consist of a language code combined with a
       country code.


(b) Region code (within countries)

Code from ISO 3166-2. Part 2 of ISO 3166 lists Subdivisions of countries.
This may be used in later versions of the table in this technical
report to indicate regions where languages are used within countries.


(c) Area code (incorporating several countries: See Annex C)

3-character alphanumeric code (A with two digits) used to identify
areas of the world, based on groupings used by the United Nations
Statistical Department. Documentation is on the website for the
United Nations Statistical Department. The coding system for area
codes is being developed elsewhere in ISO, and will be documented
separately.

This may be used in some versions of the table in this technical
report to indicate areas of the world where languages are used across
several countries. Use of this code is not intended to be normative
ithin this standard.


(d) Location codes (LOCODES)

3-letter code to be used alongside country codes from ISO 3166
developed by the United Nations, and in widespread international use.
Documentation is on the website for the United Nations Economic
Commission for Europe.

This may be used in later versions of the table in this technical
report to pinpoint centres within specific countries where individual
languages are used.


------------------------------------------------------------

4 Comparison of language codes

4.1 Structure language codes in the table in Language codes part 3

The language codes in the comparative list below, are listed to the
right of the language names, and consist only of the following 26
letters of the Latin alphabet in lower case: a, b, c, d, e, f, g, h,
i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z. They do not use
digits, punctuation, or diacritical marks or modified characters as
part of the codes.

Note that the de facto codes documented are usually shown in upper
case, and are also shown in upper case in this document.

However, note that punctuation marks may be added after some codes,
although these are only indicative.

In particular, the ! symbol (EXCLAMATION MARK) is used after some
non-standard codes to indicate that code in question should not be
used for the language indicated, as that code represents a different
language in one or more parts of ISO 639.

Individual language codes attempt to be mnemonic where  possible,
based on letters in either the indigenous form of the language name
or the English or French form. Where possible, any expressed
preference from the language communities concerned has also been
taken into account.

It is intended that once it is part of an international standard,
individual language codes will not be changed.

A single language identifier is normally provided for a language even
though the language is written in more than one script. A separate
standard may be developed for the purpose of designating information
concerning the script or writing system of a language.



4.2 Maintenance of the language code table in Language codes part 3

For Language codes part 3, it is proposed to set up a Maintenance
Agency. It is intended to mount the tables from this technical report
on the World Wide Web in a form which can allow ordering in various
ways by users, and enquiries and comments will be invited where
additional or different information needs to be added. Communication
with the Maintenance Agency should principally be by email (although
postal and fax communications will not be discouraged, where
electronic communications are unavailable).

Each application or proposal should be accompanied by a
recommendation and support of an authority (standards organization,
governmental body, linguistic institution, or cultural organization,
at international, national or local level).

The Maintenance Agency should take into account:
 - the number of speakers of the language community;
 - recognized status of the language in one or more countries;
 - the support for the application by one or more official bodies;
 - whether the language community concerned considers it is a
   separate language entity, or part of another language entity.
 - variant uses or orthographies compared to related language
   communities.

The registration procedure will be laid down in annex A.

For the Maintenance Agency for language codes part 3, the UK proposes
the Linguasphere Observatory, Hebron, SA34 0XT, Wales,
United Kingdom, +44 1994.419.300 (fax); +44 1994.419.660 (tel);
Web: www.linguasphere.org. The UK Linguasphere Observatory is part of
the international Linguasphere Research Network.

NOTE - The Registration Authority for ISO 639-1 is the
International Information Centre for Terminology (Infoterm),
Simmeringer Hauptstraße 24, A-1110 Vienna, Austria.

The Registration Authority for ISO 639-2 is the
Library of Congress, Washington, D.C., 20540 USA
(c/o Network Development and MARC Standards Office).


4.3 Combining language codes with other codes

There are examples of the use of language codes, in particular there
combination with other codes (such as country codes or script codes),
in ISO 639-1, section 4.3. Language codes part 3 does not intend to
add information of this nature, although that could appear in other
parts of ISO 639.

Other standards, particularly those developed by ISO/IEC JTC1/SC22/WG20
(Programming Languages - Internationalization) specify how such code
combinations are specified in locales, etc., and other ISO technical
committees may also have special conventions that should be taken
into account in drawing up any recommendations of this nature.


5 Table of language names and language codes

Language names and language codes, and accompanying information, are
listed in the following table. Currently only a sequence of entries
where the language names begin with G are listed as an example, and
indeed this is only a partial "G-list", containing a "top-slice" of
languages used by over 100,000 people. Languages with less users, and
language names beginning with all other letters, could be documented
in further work.

This table documents (a) codes used in different parts of ISO 639 and
also (b) two de facto codes (SIL and OT) which are used
internationally (see Key below), as well as the Linguascale (which is
a variable length classification scale and NOT a language code,
although it is intended to assist in the use of language codes) from
the Linguasphere register of the world's languages and speech
communities.

It provides a subset of a larger list compiled by SIL, which also
includes population estimates, and countries of use, to which has
been added details of matches, or mismatches, between different
language coding systems in current use. This subset is currently
composed of (a) all languages with language names beginning with G,
which (b) have around 100,000 speakers or more, with some additional
languages also included.

It is planned to scale up the current list to include details of the
approximately 7000 languages listed in SIL - all languages listed in
the SIL source, regardless of size, to be followed by a further
analysis of entries in the Linguasphere register (approximately
20,000 entries) to identify further languages to be added.


NOTE

The table below is intended to be maintained on the World Wide Web,
and to be openly available, and also to allow users to download and
order the data by language name, country name, and other criteria.
Initially, this whole document will be at least at the following URL:
http://www.sesame.demon.co.uk/codes/part3.htm

Tables derived from the above, and expanded to cover larger
repertoires, but ordered by different columns will be accessible from
http://www.sesame.demon.co.uk/codes/


NB:    It is also recommended that if viewed as an HTML file, the
       table below is viewed at the smallest or smaller text size, in
       a full screen view, in order to accomodate all columns on the
       screen. Alternatively, view at a larger size but move the
       bottom scrollbar (or left or right cursor arrows) to show just
       a subset of all columns.

------------------------------------------------------------------------------------------------------------
Ref#; Users; Area;Countries of use;    Language name                  I-2  I-3T SIL  OT   I-3B   Linguascale
------------------------------------------------------------------------------------------------------------
  617 300000  A75 Ghana                GA (GA-ADANGME-KROBO)          ..   gaa  GAC  GAD  ...    96-LAA-a
88888 --      A75 Nigeria              Gaandu                         ..   ...  ...  ...  ...    18-HBA-aa
 2633 10000   A34 Iran                 GABRI                          ..   ...  GBZ  ...  ...    58-AAC-di
 1008 114307  A35 India                GADDI (Gaddi-chamba)           ..   ...  GBK  ...  ...    59-AAF-ei
=                                      Gadhwali = Garhwali
=                                      Gaelic SIGN LANGUAGE=Irish SL
 9999 -       A22 Ireland              Gaelic, Irish Traditional      ..   ...  ...  IRT  ...    .........
  669 260000  A22 Ireland              GAELIC, IRISH \ Gaeilge        ga   gle  GLI  IRI  iri!   50-AAA-ad~ai
 9999 -       A22 United Kingdom       GAELIC, MANX                   gv   glv  MJD  MNX  max!!  50-AAA-aj
 1134 104000  A22 United Kingdom       GAELIC, SCOTS                  gd   gla  GLS  GAE  gae!   50-AAA-aa~ac
  801 198000  A27 Moldova              GAGAUZ                         ..   ...  GAG  GAG  ...    44-AAB-ab
  586 331000  A25 Turkey               GAGAUZ \ TURKISH \ Balkan      ..   ...  BGX  ...  ...    .........
  151     4m  A24 Spain                GALICIAN \ Gallegan            gl   glg  GLN  GAL  gag!   51-AAA-ab
=                                      Gallegan = Galician
  736 222000  A35 India                GAMIT \ Gamati                 ..   ...  GBL  ...  ...    59-AAF-kd
  279     1m  A74 Ethiopia             GAMO-GOFA-DAWRO                ..   ...  GMO  ...  ...    16-BAF-b
  174     3m  A74 Uganda               GANDA \ Luganda                lg   lug  LAP  LUG  ...    99-AUS-er
  215     2m  A35 India                GARHWALI \ Gadhwali            ..   ...  GBM  GAW  ...    59-AAF-c
  807 190000  A56 Honduras             GARIFUNA                       ..   ...  CAB  ...  ...    82-ABA-ba
  405 650000  A35 India                GARO                           ..   ...  GRT  GRO         72-ACA-a
  963 128000  A74 Kenya                GARREH-AJURAN                  ..   ...  GGH  ...         14-GAG-cb
  679 254800  A23 France               GASCON \ Gascou                ..   ...  GSC  ...         50-AAA-f
=                                      Gascou = Gascon
  820 180000  A36 Indonesia            GAYO                           ..   gay  GYO  ...  ...    31-MCA-a
  397 700000  A75 Nigeria              GBAGYI                         ..   ...  GBR  ...         99-BAC-b
  628 300000  A75 Nigeria              GBARI                          ..   ...  GBY  ...         98-BAC-a
 2287 16000   A74 Sudan                GBAYA                          ..   ...  KRS  ...         93-AAA-a
99999 -       A76 Central African Rep. Gbaya [languages]              ..   gba  ...  ...         93-AAA-
  661 267000  A76 Central African Rep. GBAYA, NORTHWEST               ..   ...  GYA  ...         93-AAA-a
  823 177000  A76 Central African Rep. GBAYA, SOUTHWEST               ..   ...  MDO  ...         93-AAA-a
  825 176000  A76 Central African Rep. GBAYA-BOSSANGOA                ..   ...  GBP  ...         93-AAA-a
 1747 32500   A76 Central African Rep. GBAYA-BOZOUM                   ..   ...  GBQ  ...         93-AAA-a
 3124 5500    A76 Central African Rep. GBAYI                          ..   ...  GYG  ...         93-ABB-ad
  408 637082  A74 Ethiopia             GEDEO                          ..   ...  DRS  ...         14-CAD-aa
  589 327000  A75 Togo                 GEN-GBE                        ..   ...  GEJ  ...         96-MAA-d
  147     4m  A29 Georgia              GEORGIAN                       ka   kat  GEO  KAT  geo    42-CAB-b
 9999 -       A29 Georgia              Georgian - Asomtavruli         ..   ...  ...  ...  ...    .........
 9999 -       A29 Georgia              Georgian - Mkhedruli           ..   ...  ...  ...  ...    .........
 9999 -       A29 Georgia              Georgian - Nuskhuri            ..   ...  ...  ...  ...    .........
 9999 -       A29 Georgia              Georgian Khutsuri              ..   ...  ...  KGE  ...    .........
  786 200000  A75 Nigeria              GERA                           ..   ...  GEW  ...  ...    19-BBE-aa
 6308 0       A23 Germany              GERMAN  SIGN LANGUAGE          ..   ...  GSG  ...  ...    .........
  219b    2m  A23 Germany              GERMAN, LOW (PLATTDEUTSCH)     ..   ...  GEP  ...  gml!   52-ACB-c
  530 400000  A52 Canada...(Mennonite) German, Low (PLAUTDIETSCH)     ..   ...  GRN  ...  ...    52-ACB-hd
 6311 0       A23 Germany              German, Low (SAXON, LOW)       ..   nds  SXN  ...  gel!   52-ACB-cb
 9999 0       A23 Germany              German, Middle High            ..   gmh  ...  ...  ...    52-ACB-
 9999 -       A23 Germany              German, Middle Low             ..   ...  ...  ...  gml!   52-ACB-ca
=                                      German, N=German (Plattdeutsch)
 9999 0       A23 Germany              German, Old High               ..   goh  ...  ...  ...    52-ACB-
 9999 -       A23 Germany              German, Old Low                ..   ...  ...  ...  gol!   52-ACB-
 1111 100000  A52 USA                  GERMAN, PENNSYLVANIA           ..   ...  PDC  ...  ...    52-ACB-he
   10    98m  A23 Germany              GERMAN, STANDARD               de   deu  GER  DEU  ger    52-ACB-dm
 9999 -       A24 Italy                Germanic  (Other)              ..   gem  ...  ...  ...    52
 6317 0       A75 Ghana                GHANAIAN SIGN LANGUAGE         ..   ...  GSE  ...  ...    .........
  666 260000  A76 Cameroon             GHOMALA                        ..   ...  BBJ  ...  ...    99-AGE-e
  128     5m  A74 Kenya                GIKUYU                         ki   kik  KIU  KIK         99-AUM-aa
  167     3m  A34 Iran                 GILAKI                         ..   ...  GLK  ...         58-AAK-eb
 1076 100000  A35 India                GIRASIA, ADIWASI               ..        GAS  ...  ...    59-AAF-jc
  412 623000  A74 Kenya                GIRYAMA                        ..        NYF  ...  ...    99-AUS-la
  733 225000  A77 Mozambique           GITONGA                        ..        TOH  ...  ...    99-AUT-ca
  787 200000  A75 Nigeria              GOEMAI                         ..        ANK  ...  ...    19-FDB-aa
  273     1m  A74 Tanzania             GOGO                           ..   ...  GOG  ...         99-AUS-op
 1096 100000  A75 Nigeria              GOKANA                         ..        GKN  ...  ...    98-JBB-aa
 8888 -       A75 Senegal              Gola                           ..        ...  ...  ...    90-IBA-ab
 8888 -       A75 Nigeria              Gola                           ..        ...  ...  ...    92-BAA-eb
 1024 107300  A75 Liberia              GOLA                           ..        GOL  ...  ...    94-BBA-a
99999 -       A35 India                Gondi [languages]              ..   gon  ...  ...         49-DAA-
  382 736000  A35 India                GONDI, NORTHERN                ..   ...  GON  GON         49-DAA-aa
  419 600000  A35 India                GONDI, SOUTHERN                ..        GGO  ...  ...    .........
  687 250000  A75 Ghana                GONJA                          ..        DUM  ...  ...    96-FDB-a
  337 900000  A36 Indonesia            GORONTALO                      ..   gor  GRL  ...  ...    31-NJA-a
 9999 0       A27 Ukraine; Bulgaria    Gothic \ Crimean               ..   got  ...  ...  ...    52-ADA-a
  445 559500  A75 Burkina Faso         GOURMANCHEMA fula              ..        GUX  ...  ...    90-BAA-al
99999         A75 Liberia              Grebo [languages]              ..   grb  ...  ...  ...    95-ABA-
 1971 23700   A75 Liberia              GREBO, BARCLAYVILLE            ..        GRY  ...  ...    95-ABA-
 1522 47800   A75 Liberia              GREBO, E JE                    ..   ...  GRB! ...  ...    95-ABA-
 2251 16800   A75 Liberia              GREBO, FOPO-BUA                ..        GEF  ...  ...    95-ABA-
 1392 56300   A75 Liberia              GREBO, GBOLOO                  ..        GEC  ...  ...    95-ABA-
 1867 28700   A75 Liberia              GREBO, GLEBO                   ..        GEU  ...  ...    95-ABA-
 6490 0       A75 Liberia              GREBO, GLOBO                   ..        GRV  ...  ...    95-ABA-
 6491 0       A75 Liberia              GREBO, JABO                    ..        GRJ  ...  ...    95-ABA-
 2163 19900   A75 Liberia              GREBO, NORTHEASTERN            ..        GRP  ...  ...    95-ABA-
 1788 30100   A75 Cote d'Ivoire        GREBO, SEASIDE                 ..        GRF  ...  ...    95-ABA-ld
   74    12m  A25 Greece               GREEK                          el   ell  GRK  ELL  gre    56-AAA-ac
 1578 42600   A25 Greece               GREEK  SIGN LANGUAGE           ..        GSS  ...  ...    .........
 9999 0       A25 Greece               Greek, Ancient (to 1453)       ..   grc  GKO  ...  ...    56-AAA-aa
---74 -       A25 Greece               Greek, Polytonic               ..   ...  ...  PGR         56-AAA-
  593 320000  A25 Greece               Greek, PONTIC                  ..   ...  PNT  ...         56-AAA-aj
 3170 5000    A67 Bolivia              GUARANI, BOLIVIAN, WESTERN     ..        GNW  ...  ...    88-AAI-fcb
 2508 12000   A67 Paraguay             GUARANI, MBYA                  ..        GUN  ...  ...    88-AAI-fg
  136     4m  A67 Paraguay             GUARANI, PARAGUAYAN            gn   grn  GUG  GUA  gua!   88-AAI-fa
 6320 0       A56 Guatemala            GUATEMALAN SIGN LANGUAGE       ..        GSM  ...  ...    .........
  597 317500  A75 Cote d'Ivoire        GUERE, CENTRAL                 ..        GXX  ...  ...    95-ABA-r
 6324 0       A75 Guinea               GUINEAN SIGN LANGUAGE          ..   ...  GUS  ...  ...    .........
   24    44m  A35 India                GUJARATI                       gu   guj  GJR  GUJ         59-AAF-h
  347 840000  A35 India                GUJARI                         ..        GJU  ...  ...    59-AAF-go
  855 163271  A76 Chad                 GULAY \ Gulai                  ..        GVL  ...  ...    03-AAA-eo
  977 125000  A52 USA                  Gullah \SEA ISLAND CREOLE (en) ..   ...  GUL  ...         52-ABB-aa
  983 123000  A74 Ethiopia             GUMUZ                          ..        GUK  GMZ         05-LAA-a
  468 500000  A75 Benin                GUN-GBE                        ..        GUW  ...  ...    96-MAA-gc
  348 827764  A74 Ethiopia             GURAGE, EAST \ Silte           ..   ...  GRE  SIG         12-ACC-bc
 9999 -       A74 Ethiopia             GURAGE, NORTH \ Sodo           ..   ...  GRU  SOG         12-ACE-aa
  364 798202  A74 Ethiopia             GURAGE, WEST \ Chaha           ..   ...  GUY  CHG         12-ACC-bbd
  585 332100  A75 Cote d'Ivoire        GURO \ Golo                    ..        GOA  ...  ...    00-DCA-aa
  246     1m  A74 Kenya                GUSII                          ..   ...  GUZ  ...         99-AUK-aa
 1455 50000   A65 French Guiana        GUYANAIS CREOLE (fr)           ..   ...  FRE  ...         51-AAC-cd
  392 700000  A65 Guyana               GUYANESE CREOLE (en)           ..   ...  GYN  ...         52-ABB-av
  656 275608  A74 Uganda               GWERE                          ..        GWR  ...  ...    99-AUS-ew

--------------------------------------------------------------------------------------------------
Explanation of columns:
--------------------------------------------------------------------------------------------------
#Ref#                   Internal database ID (not intended for publication)

Users;                  Approximate numbers of speakers

Area                    Draft code for areas (covering several countries,
                        expanded from United Nations Statistical Office
                        area divisions: see informative Annex C)

Note:                   The columns Ref#; Users; and Area are
                        included for general information only, as
                        several of the language names may be
                        unfamiliar to ISO/TC37/SC2/WG1 members.
                        They may well not appear in future versions
                        of this table, in future drafts.

Countries of use        A country or countries associated with use or
                        origin of that language. Where more than one
                        country is associated with its use, other
                        country names will be added where space
                        permits, or "etc" will be added to indicate
                        use in adjacent countries.

Language name           Word(s) identifying the language

--------------------------------------------------------------------------------------------------
ISO codes:
--------------------------------------------------------------------------------------------------
I-2                     ISO 639-1 (2-letter codes)
I-3T                    ISO 639-2 (3-letter codes - Terminology use)
I-3B                    ISO 639-2 (3-letter codes - Bibliographic use)

--------------------------------------------------------------------------------------------------
Non-ISO sytems documented:
--------------------------------------------------------------------------------------------------
SIL                     Ethnologue (SIL - Summer Institute of Linguistics)
OT                      OpenType specification (Microsoft, Adobe, et al).

--------------------------------------------------------------------------------------------------
Linguascale:
--------------------------------------------------------------------------------------------------

Scale of linguistic relationships from Linguasphere Register of
the World's Language and Speech Communities (a device for grouping
languages by relationship and NOT a system of reference codes)

First digits denote 5 phylosectors each representing a major
linguistic family (odd digits) or 5 geosectors each representing a
continental area of reference for languages outside these major
families (even digits):

        0 Africa;               1 Afro-Asian;
        2 Australasia;          3 Austronesian;
        4 Eurasia;              5 Indo-European;
        6 North-America;        7 Sino-Indian;
        8 South-America.        9 Transafrican)

Second digits indicate zones of reference within each phylosector or
geosector.

Three upper case letters indicate layers of increasingly close
relationship (set; chain; net).

One, two or three lower case letters indicate layers of immediate
(very close) relationship (outer language; inner language; dialect)


--------------------------------------------------------------------------------------------------
Symbols used:
--------------------------------------------------------------------------------------------------
In Name column:

\                       Language name can appear in two forms.
                        Shown that way in order to show which
                        language names have that feature.
                        References will be generated from names
                        which incorporate that character.

~                       Term to be avoided (sometimes derogatory term
                        used by those outside the language group).

=                       Cross reference to a full entry.

In Code column:

!                       If users mix 3-letter ISO codes with
                        ISO codes, these should _NOT_ be used for
                        this language as it causes a clash (one code
                        with two meanings, or one meaning with two
                        codes).

In Linguascale column:

~                       Indicates a range of entries in the
                        Linguasphere register.


------------------------------------------------------------
Annex A (normative) Procedures for the Registration Authority
for Language codes part 3 [to be added]

------------------------------------------------------------
Annex B (informative) Bibliography [to be added]

--------------------------------------------------------------------------------------------------
Annex C (informative) Area codes used
(expanded from UN Statistical Office areas)

A10 INTERNATIONAL (usage, and any subdivision, currently unspecified)

A20 EUROPE
  A22 Northern Europe     A23 Western Europe      A24 Southwest Europe
  A25 Southeast Europe    A26 Central Europe      A27 Eastern Europe
  A28 North Eurasia[1]    A28 Caucasus/Anatolia

A30 ASIA
  A33 Central Eurasia[1]  A34 West Asia           A35 South Asia
  A36 Southeast Asia      A37 East Asia

A40 PACIFIC
  A43 Australia           A44 New Zealand         A45 Papua New Guinea
  A46 Melanesia           A47 Micronesia          A48 Polynesia

A50 NORTHERN AMERICA
  A52 North America       A53 Caribbean           A56 Central America (N)
                                                  A57 Central America (S)

A60 SOUTHERN AMERICA
  A64 South America (NW)  A65 South America (NE)  A66 Luso-America
  A67 South America (S)

A70 AFRICA
  A72 North Africa        A74 East Africa         A75 West Africa
  A76 Central Africa      A77 Southern Africa

A80 MARITIME TERRITORIES (in line with UN/ECLAC code and UN/LOCODE use)
  A81 South Atlantic/Antarctic territories
  A83 Indian Ocean/Southern Ocean Territories

Fuller documentation on area codes will be available via
ISO/IEC JTC1/SC22/WG20.


--
John Clews,
Keytempo Information Management,
8 Avenue Rd, Harrogate, HG2 7PG
Email: Scripts@sesame.demon.co.uk
tel: +44 1423 888 432;

Committee Chair of  ISO/TC46/SC2: Conversion of Written Languages;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37: Terminology