------------------------------------------------------------ ISO/TC37/SC2 N 241 ISO/TC37/SC2/WG1 N 74 (R) Title: Draft technical report: Language codes part 3: Guide to the alpha-numeric coding of the world's languages Date: 2001-07-27 Source: John Clews (UK) Status: Personal contribution Action: For information Distribution: ISO/TC37/SC2 ------------------------------------------------------------ Draft technical report: Language codes part 3: Guide to the alpha-numeric coding of the world's languages This is a document that could (a) serve towards being either a technical report which documents different language codes now in widespread use, and (b) - if deemed appropriate by ISO/TC37/SC2 - could also be developed further as a list of single language codes which could be used to extend the current repertoire of language codes in the existing parts of ISO 639. Direction for further development will be taken from ISO/TC37/SC2. It starts from the premise that individual codes from various existing coding systems are already being used together, and outlines problems to avoid in doing this. General notes: (1) ISO 639-1 and ISO 639-2 form parts 1 and 2 in relation to this part 3 (2) The structure of this draft technical report currently mirrors the structure of ISO 639-1. (3) As an HTML file, this document is best viewed at a medium or lower resolution, and the table in section 5 viewed at a small or smaller resolution, to avoid text disappearing at the right hand margin. ------------------------------------------------------------ 1 Scope 2 Normative references 3 Terms and definitions 4 Comparison of language codes 4.1 Structure of the language code table in Language codes part 3 4.2 Maintenance of the language code table in Language codes part 3 4.3 Combining language codes with other codes 5 Table of language names and language codes Annex A (normative) Procedures for the Registration Authority for Language codes part 3 [to be added] Annex B (informative) Bibliography [to be added] Annex C (informative) Area codes used (expanded from United Nations Statistical Office areas) ------------------------------------------------------------ Foreword [similar in all ISO standards] ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) together form a system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the re-spective organization to deal with specific technical fields. Other governmental and non-governmental international organizations with liaison to IEC and ISO also take part in the work. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3. This draft technical report was prepared by Working Group 1 "Language codes" of Subcommittee 2 "Layout of vocabularies" of the Technical Committees of ISO TC 37 "Terminology (principles and coordination)". Draft International Standards and technical rports are circulated to national bodies for approval before their acceptance as Inter-national Standards. They are approved in accordance with procedures requiring at least 75 % approval by the national bodies voting. ------------------------------------------------------------ Introduction This draft technical report has been prepared taking into account the aims and needs expressed in the document ISO/TC37/SC2/WG1 N69: Coding systems, prepared on 2001-01-31 by Håvard Hjulstad (convenor of ISO/TC37/SC2/WG1) in Norway. This introduction further discusses aims and needs in this area. (a) Relationship between different parts of ISO 639 Language codes part 3 provides additional information and extends the capability and functionality of ISO 639-1 (2-letter codes, intended for use in terminology) and ISO 639-2 (3-letter codes, intended for use in bibliography). To achieve this aim, Language codes part 3 has been developed to combine ISO 639-1 and ISO 639-2 with information from the Linguasphere register of the world's languages and speech communities, and additionally provides information used in other coding systems, covering a wider range of languages than ISO 639-1 and ISO 639-2 provide at present. Language codes part 3 does not replace ISO 639-1 and ISO 639-2. Indeed their codes are preserved intact, and documented together for use with implementations such as RFC 3066 "Language Tags" (or Internet use, superseding RFC 1766 "Language Tags"). Language codes part 3 is useful in this regard in that it provides a single list of language codes for use with RFC 3066, avoiding the need to consult ISO 639-1 and ISO 639-2 separately. (b) Background to work in ISO/TC37/SC2 and ISO/TC46/SC4 ISO 639-1 was devised primarily for use in terminology, lexicography and linguistics, and contains simple 2-letter codes for variant languages. ISO 639-2 was devised primarily for use in library systems. It provides codes for (i) all languages contained in ISO 639-1; (ii) other languages not contained in ISO 639-1; (iii) older languages not covered in ISO 639-1; and in addition (iv) generic language groups. However, many users besides terminologists and librarians now use language codes, particularly in ICT systems, and there is a need for a generic language coding system which provides codes for many more languages. If an extended, generic, coding system is not developed by ISO for more generic use, users will be (and already are are) devising their own, to meet immediate needs, with the result that such extensions prevent information interchange on a larger scale. ISO 639-1 is limited to 2-letter codes, and thus codes can only be provided for 26x26 languages. Many linguists, and others, (including some international organizations) have in fact used other codes, particularly those developed by the Summer Institute of Linguistics (SIL) in their Ethnologue publications. It has also been agreed that ISO 639-1 should have no more codes added after an agreed point, to avoid clashes between ISO 639-1 and ISO 639-2 in specifications such as RFC 3066 (Language Tags). ISO 639-2 could provide codes for 26x26x26 languages. However, ISO 639-2 also has its own limits specified in the standard: unless a substantial body of literature (50 different items in 5 bibliographic agencies) is documented, codes are not allocated, even for languages with official status within countries or regions of countries, which can adversely affect some aspects of computer development for certain languages. This is rigorously applied, as set out in the standard, and adverse effects for generic uses (particularly in ICT systems) are already in evidence. (c) More generic needs In ICT applications, there has been a tendency to use codes from ISO 639-1 and ISO 639-2, where possible, though at times there has been some frustration that less codes were available than ICT users required. Some ICT users have also used their own coding systems, notably the OpenType specifications (OT) used in font and rendering technologies, and others have used the SIL codes. (d) Variant codes Because of the again relatively small number of codes allocated, variant codes were developed in various countries, including the UK, Sweden and Germany, which in some cases have caused clashes in bibliographic information interchange. (e) Combining codes Some users also have the tendency to combine language codes with other codes (such as script codes, country codes, etc) and it may be that ISO/TC37 can provide guidance on best practice in combining such codes in order to avoid clashes arsing from different approaches. (f) The way forward As the originator of the original language coding standard, and with a significant involvement in various aspects of ICT development, members of ISO/TC37 are in a good position to develop a further standard for more generic needs. It is intended that ISO/TC37/SC2/WG1 should be in contact with other user communities represented by ISO, particularly with SCs of ISO/IEC JTC1, in developing this work. There are also user communities who are not always directly involved with ISO who could make use of extended language codes. For instance, in the early stages of investigating this in the UK, it is clear that various national and international government agencies, in Europe and North America, have the need for a large range of codes for statistical purposes. There is an urgency in this because (i) language codes are now in very widespread use in information systems, and used in very large numbers in information interchange; (ii) there are several different international, national, and de facto standards, each of which includes codes which clash with codes in other coding systems. (iii) the international standards involved (ISO 639 and ISO 639-2) have limits on the number of codes that can be applied, and users are developing their own extensions, incompatible with each other and with any part of ISO 639. Unless the provenance of the coding sytem used is always documented with each information interchange, which is not really feasible, the use of wrong coding, and erroneous data, is extremely likely. Most of the alternative coding systems use a 3-letter code, which makes it difficult to be sure that users are interchanging the same codes with the same meanings, because while some code elements are the same in each system, many are not, and no documentation is available which provides information on all of this. This draft technical report documents alternative practices, but does not limit the use of alternative practices, and also aims to provide guidance on optimum ways to use language codes to avoid problems. ------------------------------------------------------------ 1 Scope Language codes part 3 lists language codes used in ISO 639-1 and ISO 639-2, and also provides information on additional language codes used in other coding systems. This is provided in a detailed table. It plans to provide information on which language codes from other coding systems are safe to use in addition to codes from ISO 639-1 and ISO 639-2, and guidelines on avoiding problems. There is the potential to develop a further full standard (a notional ISO 639-3) which would provide a much-extended list of language codes, in comparison to that currently available, to meet user needs. However, the initial aims is to provide documentation, and that is the principle aim of this draft technical report. The structure of Language codes part 3 mirrors that of ISO 639-1. ------------------------------------------------------------ 2 Normative references The following standard contains provisions which, through reference in this text, constitute provisions of this International Standard. At the time of publication, the edition indicated was valid. All standards are subject to revision, and parties to agreements based on this International Standard apply the most recent editions of the standards indicated below. Members of IEC and ISO maintain registers of currently valid International Standards. ISO/DIS 1087-1:1997, Terminology work - Vocabulary - Part 1: Theory and application. ISO 3166-1:1997, Codes for the representation of names of countries and their subdivisions - Part 1: Country codes. ISO 3166-2:1998, Codes for the representation of names of countries and their subdivisions - Part 2: Country subdivision code. ISO 3166-3:1999, Codes for the representation of names of countries and their subdivisions - Part 3: Code for formerly used names of countries. ISO/DIS 5127-1:1996, Information and documentation - Vocabulary - Basic and framework terms. ------------------------------------------------------------ 3 Terms and definitions For the purpose of this part of ISO 639 the following definitions apply: 3.1 Coding system data transformed or represented in different forms according to a pre-established set of rules (ISO/DIS 5127-1:1996) 3.1 3.2 Language code 2-letter or 3-letter code representing a language. Both existing ISO codes, and other codes and scales are documented here. NOTE To save space in tables, the following conventions are used: I-2 2-letter codes from ISO 639 and ISO 639-1, and new codes applied by the ISO 639 Maintenance Agency; I-3T 3-letter codes from ISO 639-2, and new codes applied by the ISO 639-2 Maintenance Agency; I-3B 3-letter bibliograhic codes from ISO 639-2, and national variants of these codes used in libraries. OT 3-letter OpenType language tags, developed by Adobe and Microsoft, widely used in the IT industry; SIL 3-letter codes from the Ethnologue, published by the Summer Institute of Linguistics (SIL); 3.3 Language name Word(s) identifying the language NOTE Given various names in use, various conventions will be used to simplify the use of cross-references and alternative names. See Key to section 5 for details. 3.4 Script code Code element from ISO DIS 15924: Codes for representation of names of scripts. This may be used in later versions of the table in this technical report. Currently codes from ISO DIS 15924 are not used in this table. 3.5 Linguascale Classification system enabling languages to be grouped by proximity of relationship. Documented in the Linguasphere Register of the World's Languages and Speech Communities (see Bibliography). NOTE This is NOT a standard or normative code, but is used to enable sorting of the table, so that related languages can appear close together. It will allow more detailed comparison of the various generic codes included in ISO 639-2, and will also assist decisions as to whether linguistic entities may be regarded as languages, dialects, or more generic language groupings. 3.6 Geographic codes (a) Country code Code from ISO 3166: Codes for representation of names of countries. A country code may be used in later versions of the table in this technical report, to identify place(s) of use of a language. NOTE For ease of use, the present table uses country names rather than country code. This is also to avoid confusion between the specification of a language in a particular country, and locale IDs used in programming language environments, both of which often consist of a language code combined with a country code. (b) Region code (within countries) Code from ISO 3166-2. Part 2 of ISO 3166 lists Subdivisions of countries. This may be used in later versions of the table in this technical report to indicate regions where languages are used within countries. (c) Area code (incorporating several countries: See Annex C) 3-character alphanumeric code (A with two digits) used to identify areas of the world, based on groupings used by the United Nations Statistical Department. Documentation is on the website for the United Nations Statistical Department. The coding system for area codes is being developed elsewhere in ISO, and will be documented separately. This may be used in some versions of the table in this technical report to indicate areas of the world where languages are used across several countries. Use of this code is not intended to be normative ithin this standard. (d) Location codes (LOCODES) 3-letter code to be used alongside country codes from ISO 3166 developed by the United Nations, and in widespread international use. Documentation is on the website for the United Nations Economic Commission for Europe. This may be used in later versions of the table in this technical report to pinpoint centres within specific countries where individual languages are used. ------------------------------------------------------------ 4 Comparison of language codes 4.1 Structure language codes in the table in Language codes part 3 The language codes in the comparative list below, are listed to the right of the language names, and consist only of the following 26 letters of the Latin alphabet in lower case: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z. They do not use digits, punctuation, or diacritical marks or modified characters as part of the codes. Note that the de facto codes documented are usually shown in upper case, and are also shown in upper case in this document. However, note that punctuation marks may be added after some codes, although these are only indicative. In particular, the ! symbol (EXCLAMATION MARK) is used after some non-standard codes to indicate that code in question should not be used for the language indicated, as that code represents a different language in one or more parts of ISO 639. Individual language codes attempt to be mnemonic where possible, based on letters in either the indigenous form of the language name or the English or French form. Where possible, any expressed preference from the language communities concerned has also been taken into account. It is intended that once it is part of an international standard, individual language codes will not be changed. A single language identifier is normally provided for a language even though the language is written in more than one script. A separate standard may be developed for the purpose of designating information concerning the script or writing system of a language. 4.2 Maintenance of the language code table in Language codes part 3 For Language codes part 3, it is proposed to set up a Maintenance Agency. It is intended to mount the tables from this technical report on the World Wide Web in a form which can allow ordering in various ways by users, and enquiries and comments will be invited where additional or different information needs to be added. Communication with the Maintenance Agency should principally be by email (although postal and fax communications will not be discouraged, where electronic communications are unavailable). Each application or proposal should be accompanied by a recommendation and support of an authority (standards organization, governmental body, linguistic institution, or cultural organization, at international, national or local level). The Maintenance Agency should take into account: - the number of speakers of the language community; - recognized status of the language in one or more countries; - the support for the application by one or more official bodies; - whether the language community concerned considers it is a separate language entity, or part of another language entity. - variant uses or orthographies compared to related language communities. The registration procedure will be laid down in annex A. For the Maintenance Agency for language codes part 3, the UK proposes the Linguasphere Observatory, Hebron, SA34 0XT, Wales, United Kingdom, +44 1994.419.300 (fax); +44 1994.419.660 (tel); Web: www.linguasphere.org. The UK Linguasphere Observatory is part of the international Linguasphere Research Network. NOTE - The Registration Authority for ISO 639-1 is the International Information Centre for Terminology (Infoterm), Simmeringer Hauptstraße 24, A-1110 Vienna, Austria. The Registration Authority for ISO 639-2 is the Library of Congress, Washington, D.C., 20540 USA (c/o Network Development and MARC Standards Office). 4.3 Combining language codes with other codes There are examples of the use of language codes, in particular there combination with other codes (such as country codes or script codes), in ISO 639-1, section 4.3. Language codes part 3 does not intend to add information of this nature, although that could appear in other parts of ISO 639. Other standards, particularly those developed by ISO/IEC JTC1/SC22/WG20 (Programming Languages - Internationalization) specify how such code combinations are specified in locales, etc., and other ISO technical committees may also have special conventions that should be taken into account in drawing up any recommendations of this nature. 5 Table of language names and language codes Language names and language codes, and accompanying information, are listed in the following table. Currently only a sequence of entries where the language names begin with G are listed as an example, and indeed this is only a partial "G-list", containing a "top-slice" of languages used by over 100,000 people. Languages with less users, and language names beginning with all other letters, could be documented in further work. This table documents (a) codes used in different parts of ISO 639 and also (b) two de facto codes (SIL and OT) which are used internationally (see Key below), as well as the Linguascale (which is a variable length classification scale and NOT a language code, although it is intended to assist in the use of language codes) from the Linguasphere register of the world's languages and speech communities. It provides a subset of a larger list compiled by SIL, which also includes population estimates, and countries of use, to which has been added details of matches, or mismatches, between different language coding systems in current use. This subset is currently composed of (a) all languages with language names beginning with G, which (b) have around 100,000 speakers or more, with some additional languages also included. It is planned to scale up the current list to include details of the approximately 7000 languages listed in SIL - all languages listed in the SIL source, regardless of size, to be followed by a further analysis of entries in the Linguasphere register (approximately 20,000 entries) to identify further languages to be added. NOTE The table below is intended to be maintained on the World Wide Web, and to be openly available, and also to allow users to download and order the data by language name, country name, and other criteria. Initially, this whole document will be at least at the following URL: http://www.sesame.demon.co.uk/codes/part3.htm Tables derived from the above, and expanded to cover larger repertoires, but ordered by different columns will be accessible from http://www.sesame.demon.co.uk/codes/ NB: It is also recommended that if viewed as an HTML file, the table below is viewed at the smallest or smaller text size, in a full screen view, in order to accomodate all columns on the screen. Alternatively, view at a larger size but move the bottom scrollbar (or left or right cursor arrows) to show just a subset of all columns. ------------------------------------------------------------------------------------------------------------ Ref#; Users; Area;Countries of use; Language name I-2 I-3T SIL OT I-3B Linguascale ------------------------------------------------------------------------------------------------------------ 617 300000 A75 Ghana GA (GA-ADANGME-KROBO) .. gaa GAC GAD ... 96-LAA-a 88888 -- A75 Nigeria Gaandu .. ... ... ... ... 18-HBA-aa 2633 10000 A34 Iran GABRI .. ... GBZ ... ... 58-AAC-di 1008 114307 A35 India GADDI (Gaddi-chamba) .. ... GBK ... ... 59-AAF-ei = Gadhwali = Garhwali = Gaelic SIGN LANGUAGE=Irish SL 9999 - A22 Ireland Gaelic, Irish Traditional .. ... ... IRT ... ......... 669 260000 A22 Ireland GAELIC, IRISH \ Gaeilge ga gle GLI IRI iri! 50-AAA-ad~ai 9999 - A22 United Kingdom GAELIC, MANX gv glv MJD MNX max!! 50-AAA-aj 1134 104000 A22 United Kingdom GAELIC, SCOTS gd gla GLS GAE gae! 50-AAA-aa~ac 801 198000 A27 Moldova GAGAUZ .. ... GAG GAG ... 44-AAB-ab 586 331000 A25 Turkey GAGAUZ \ TURKISH \ Balkan .. ... BGX ... ... ......... 151 4m A24 Spain GALICIAN \ Gallegan gl glg GLN GAL gag! 51-AAA-ab = Gallegan = Galician 736 222000 A35 India GAMIT \ Gamati .. ... GBL ... ... 59-AAF-kd 279 1m A74 Ethiopia GAMO-GOFA-DAWRO .. ... GMO ... ... 16-BAF-b 174 3m A74 Uganda GANDA \ Luganda lg lug LAP LUG ... 99-AUS-er 215 2m A35 India GARHWALI \ Gadhwali .. ... GBM GAW ... 59-AAF-c 807 190000 A56 Honduras GARIFUNA .. ... CAB ... ... 82-ABA-ba 405 650000 A35 India GARO .. ... GRT GRO 72-ACA-a 963 128000 A74 Kenya GARREH-AJURAN .. ... GGH ... 14-GAG-cb 679 254800 A23 France GASCON \ Gascou .. ... GSC ... 50-AAA-f = Gascou = Gascon 820 180000 A36 Indonesia GAYO .. gay GYO ... ... 31-MCA-a 397 700000 A75 Nigeria GBAGYI .. ... GBR ... 99-BAC-b 628 300000 A75 Nigeria GBARI .. ... GBY ... 98-BAC-a 2287 16000 A74 Sudan GBAYA .. ... KRS ... 93-AAA-a 99999 - A76 Central African Rep. Gbaya [languages] .. gba ... ... 93-AAA- 661 267000 A76 Central African Rep. GBAYA, NORTHWEST .. ... GYA ... 93-AAA-a 823 177000 A76 Central African Rep. GBAYA, SOUTHWEST .. ... MDO ... 93-AAA-a 825 176000 A76 Central African Rep. GBAYA-BOSSANGOA .. ... GBP ... 93-AAA-a 1747 32500 A76 Central African Rep. GBAYA-BOZOUM .. ... GBQ ... 93-AAA-a 3124 5500 A76 Central African Rep. GBAYI .. ... GYG ... 93-ABB-ad 408 637082 A74 Ethiopia GEDEO .. ... DRS ... 14-CAD-aa 589 327000 A75 Togo GEN-GBE .. ... GEJ ... 96-MAA-d 147 4m A29 Georgia GEORGIAN ka kat GEO KAT geo 42-CAB-b 9999 - A29 Georgia Georgian - Asomtavruli .. ... ... ... ... ......... 9999 - A29 Georgia Georgian - Mkhedruli .. ... ... ... ... ......... 9999 - A29 Georgia Georgian - Nuskhuri .. ... ... ... ... ......... 9999 - A29 Georgia Georgian Khutsuri .. ... ... KGE ... ......... 786 200000 A75 Nigeria GERA .. ... GEW ... ... 19-BBE-aa 6308 0 A23 Germany GERMAN SIGN LANGUAGE .. ... GSG ... ... ......... 219b 2m A23 Germany GERMAN, LOW (PLATTDEUTSCH) .. ... GEP ... gml! 52-ACB-c 530 400000 A52 Canada...(Mennonite) German, Low (PLAUTDIETSCH) .. ... GRN ... ... 52-ACB-hd 6311 0 A23 Germany German, Low (SAXON, LOW) .. nds SXN ... gel! 52-ACB-cb 9999 0 A23 Germany German, Middle High .. gmh ... ... ... 52-ACB- 9999 - A23 Germany German, Middle Low .. ... ... ... gml! 52-ACB-ca = German, N=German (Plattdeutsch) 9999 0 A23 Germany German, Old High .. goh ... ... ... 52-ACB- 9999 - A23 Germany German, Old Low .. ... ... ... gol! 52-ACB- 1111 100000 A52 USA GERMAN, PENNSYLVANIA .. ... PDC ... ... 52-ACB-he 10 98m A23 Germany GERMAN, STANDARD de deu GER DEU ger 52-ACB-dm 9999 - A24 Italy Germanic (Other) .. gem ... ... ... 52 6317 0 A75 Ghana GHANAIAN SIGN LANGUAGE .. ... GSE ... ... ......... 666 260000 A76 Cameroon GHOMALA .. ... BBJ ... ... 99-AGE-e 128 5m A74 Kenya GIKUYU ki kik KIU KIK 99-AUM-aa 167 3m A34 Iran GILAKI .. ... GLK ... 58-AAK-eb 1076 100000 A35 India GIRASIA, ADIWASI .. GAS ... ... 59-AAF-jc 412 623000 A74 Kenya GIRYAMA .. NYF ... ... 99-AUS-la 733 225000 A77 Mozambique GITONGA .. TOH ... ... 99-AUT-ca 787 200000 A75 Nigeria GOEMAI .. ANK ... ... 19-FDB-aa 273 1m A74 Tanzania GOGO .. ... GOG ... 99-AUS-op 1096 100000 A75 Nigeria GOKANA .. GKN ... ... 98-JBB-aa 8888 - A75 Senegal Gola .. ... ... ... 90-IBA-ab 8888 - A75 Nigeria Gola .. ... ... ... 92-BAA-eb 1024 107300 A75 Liberia GOLA .. GOL ... ... 94-BBA-a 99999 - A35 India Gondi [languages] .. gon ... ... 49-DAA- 382 736000 A35 India GONDI, NORTHERN .. ... GON GON 49-DAA-aa 419 600000 A35 India GONDI, SOUTHERN .. GGO ... ... ......... 687 250000 A75 Ghana GONJA .. DUM ... ... 96-FDB-a 337 900000 A36 Indonesia GORONTALO .. gor GRL ... ... 31-NJA-a 9999 0 A27 Ukraine; Bulgaria Gothic \ Crimean .. got ... ... ... 52-ADA-a 445 559500 A75 Burkina Faso GOURMANCHEMA fula .. GUX ... ... 90-BAA-al 99999 A75 Liberia Grebo [languages] .. grb ... ... ... 95-ABA- 1971 23700 A75 Liberia GREBO, BARCLAYVILLE .. GRY ... ... 95-ABA- 1522 47800 A75 Liberia GREBO, E JE .. ... GRB! ... ... 95-ABA- 2251 16800 A75 Liberia GREBO, FOPO-BUA .. GEF ... ... 95-ABA- 1392 56300 A75 Liberia GREBO, GBOLOO .. GEC ... ... 95-ABA- 1867 28700 A75 Liberia GREBO, GLEBO .. GEU ... ... 95-ABA- 6490 0 A75 Liberia GREBO, GLOBO .. GRV ... ... 95-ABA- 6491 0 A75 Liberia GREBO, JABO .. GRJ ... ... 95-ABA- 2163 19900 A75 Liberia GREBO, NORTHEASTERN .. GRP ... ... 95-ABA- 1788 30100 A75 Cote d'Ivoire GREBO, SEASIDE .. GRF ... ... 95-ABA-ld 74 12m A25 Greece GREEK el ell GRK ELL gre 56-AAA-ac 1578 42600 A25 Greece GREEK SIGN LANGUAGE .. GSS ... ... ......... 9999 0 A25 Greece Greek, Ancient (to 1453) .. grc GKO ... ... 56-AAA-aa ---74 - A25 Greece Greek, Polytonic .. ... ... PGR 56-AAA- 593 320000 A25 Greece Greek, PONTIC .. ... PNT ... 56-AAA-aj 3170 5000 A67 Bolivia GUARANI, BOLIVIAN, WESTERN .. GNW ... ... 88-AAI-fcb 2508 12000 A67 Paraguay GUARANI, MBYA .. GUN ... ... 88-AAI-fg 136 4m A67 Paraguay GUARANI, PARAGUAYAN gn grn GUG GUA gua! 88-AAI-fa 6320 0 A56 Guatemala GUATEMALAN SIGN LANGUAGE .. GSM ... ... ......... 597 317500 A75 Cote d'Ivoire GUERE, CENTRAL .. GXX ... ... 95-ABA-r 6324 0 A75 Guinea GUINEAN SIGN LANGUAGE .. ... GUS ... ... ......... 24 44m A35 India GUJARATI gu guj GJR GUJ 59-AAF-h 347 840000 A35 India GUJARI .. GJU ... ... 59-AAF-go 855 163271 A76 Chad GULAY \ Gulai .. GVL ... ... 03-AAA-eo 977 125000 A52 USA Gullah \SEA ISLAND CREOLE (en) .. ... GUL ... 52-ABB-aa 983 123000 A74 Ethiopia GUMUZ .. GUK GMZ 05-LAA-a 468 500000 A75 Benin GUN-GBE .. GUW ... ... 96-MAA-gc 348 827764 A74 Ethiopia GURAGE, EAST \ Silte .. ... GRE SIG 12-ACC-bc 9999 - A74 Ethiopia GURAGE, NORTH \ Sodo .. ... GRU SOG 12-ACE-aa 364 798202 A74 Ethiopia GURAGE, WEST \ Chaha .. ... GUY CHG 12-ACC-bbd 585 332100 A75 Cote d'Ivoire GURO \ Golo .. GOA ... ... 00-DCA-aa 246 1m A74 Kenya GUSII .. ... GUZ ... 99-AUK-aa 1455 50000 A65 French Guiana GUYANAIS CREOLE (fr) .. ... FRE ... 51-AAC-cd 392 700000 A65 Guyana GUYANESE CREOLE (en) .. ... GYN ... 52-ABB-av 656 275608 A74 Uganda GWERE .. GWR ... ... 99-AUS-ew -------------------------------------------------------------------------------------------------- Explanation of columns: -------------------------------------------------------------------------------------------------- #Ref# Internal database ID (not intended for publication) Users; Approximate numbers of speakers Area Draft code for areas (covering several countries, expanded from United Nations Statistical Office area divisions: see informative Annex C) Note: The columns Ref#; Users; and Area are included for general information only, as several of the language names may be unfamiliar to ISO/TC37/SC2/WG1 members. They may well not appear in future versions of this table, in future drafts. Countries of use A country or countries associated with use or origin of that language. Where more than one country is associated with its use, other country names will be added where space permits, or "etc" will be added to indicate use in adjacent countries. Language name Word(s) identifying the language -------------------------------------------------------------------------------------------------- ISO codes: -------------------------------------------------------------------------------------------------- I-2 ISO 639-1 (2-letter codes) I-3T ISO 639-2 (3-letter codes - Terminology use) I-3B ISO 639-2 (3-letter codes - Bibliographic use) -------------------------------------------------------------------------------------------------- Non-ISO sytems documented: -------------------------------------------------------------------------------------------------- SIL Ethnologue (SIL - Summer Institute of Linguistics) OT OpenType specification (Microsoft, Adobe, et al). -------------------------------------------------------------------------------------------------- Linguascale: -------------------------------------------------------------------------------------------------- Scale of linguistic relationships from Linguasphere Register of the World's Language and Speech Communities (a device for grouping languages by relationship and NOT a system of reference codes) First digits denote 5 phylosectors each representing a major linguistic family (odd digits) or 5 geosectors each representing a continental area of reference for languages outside these major families (even digits): 0 Africa; 1 Afro-Asian; 2 Australasia; 3 Austronesian; 4 Eurasia; 5 Indo-European; 6 North-America; 7 Sino-Indian; 8 South-America. 9 Transafrican) Second digits indicate zones of reference within each phylosector or geosector. Three upper case letters indicate layers of increasingly close relationship (set; chain; net). One, two or three lower case letters indicate layers of immediate (very close) relationship (outer language; inner language; dialect) -------------------------------------------------------------------------------------------------- Symbols used: -------------------------------------------------------------------------------------------------- In Name column: \ Language name can appear in two forms. Shown that way in order to show which language names have that feature. References will be generated from names which incorporate that character. ~ Term to be avoided (sometimes derogatory term used by those outside the language group). = Cross reference to a full entry. In Code column: ! If users mix 3-letter ISO codes with ISO codes, these should _NOT_ be used for this language as it causes a clash (one code with two meanings, or one meaning with two codes). In Linguascale column: ~ Indicates a range of entries in the Linguasphere register. ------------------------------------------------------------ Annex A (normative) Procedures for the Registration Authority for Language codes part 3 [to be added] ------------------------------------------------------------ Annex B (informative) Bibliography [to be added] -------------------------------------------------------------------------------------------------- Annex C (informative) Area codes used (expanded from UN Statistical Office areas) A10 INTERNATIONAL (usage, and any subdivision, currently unspecified) A20 EUROPE A22 Northern Europe A23 Western Europe A24 Southwest Europe A25 Southeast Europe A26 Central Europe A27 Eastern Europe A28 North Eurasia[1] A28 Caucasus/Anatolia A30 ASIA A33 Central Eurasia[1] A34 West Asia A35 South Asia A36 Southeast Asia A37 East Asia A40 PACIFIC A43 Australia A44 New Zealand A45 Papua New Guinea A46 Melanesia A47 Micronesia A48 Polynesia A50 NORTHERN AMERICA A52 North America A53 Caribbean A56 Central America (N) A57 Central America (S) A60 SOUTHERN AMERICA A64 South America (NW) A65 South America (NE) A66 Luso-America A67 South America (S) A70 AFRICA A72 North Africa A74 East Africa A75 West Africa A76 Central Africa A77 Southern Africa A80 MARITIME TERRITORIES (in line with UN/ECLAC code and UN/LOCODE use) A81 South Atlantic/Antarctic territories A83 Indian Ocean/Southern Ocean Territories Fuller documentation on area codes will be available via ISO/IEC JTC1/SC22/WG20. -- John Clews, Keytempo Information Management, 8 Avenue Rd, Harrogate, HG2 7PG Email: Scripts@sesame.demon.co.uk tel: +44 1423 888 432; Committee Chair of ISO/TC46/SC2: Conversion of Written Languages; Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization; Committee Member of ISO/TC37: Terminology