LINGUIST List 11.379

Tue Feb 22 2000

Qs: Update & Request for ISO 639 Lang Candidates

Editor for this issue: Karen Milligan <>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.


  1. John Clews, Potential future candidates for new ISO 639 codes (larger languages)

Message 1: Potential future candidates for new ISO 639 codes (larger languages)

Date: Tue, 22 Feb 2000 10:47:00 +0000
From: John Clews <>
Subject: Potential future candidates for new ISO 639 codes (larger languages)


Potential future candidates for new ISO 639 codes (larger languages)

Thank you for the correspondence on the Linguist List (and on other
language-related lists) on codes that might be useful for inclusion
in ISO 639. The correspondence has certainly helped me to enable to
prioritise the most urgent languages to be added.

ISO 639 tends to provide codes only for the larger languages,
although it still needs to provide codes for several larger languages
(by number of speakers). As a rough guide, I am aiming to ensure that
codes will be provided in due course for most distinct languages
where there are a million or more speakers.

In passing, ISO 639 tends to leave the allocation of codes for
lesser-used languages to organizations such as the Summer Institute
of Linguistics (SIL) in its Ethnologue codes.

In fact while in Washington at the recent ISO 639 Joint Advisory
Committee (17-18 February 2000) I had some very useful preliminary
discussions with Peter Constable of SIL, about how these 3-letter
codes might interface with each other, although considerably more
work needs to be put into that. However, given linguists' frequent
use of SIL codes, this may be a useful exercise in due course.

I plan to send the Linguist List a report of the meeting of the
ISO 639 Joint Advisory Committee later. Codes for a few additional
languages were added at this meeting: the main discussions were on
clarifying some precedural issues, that should allow for much more
rapid addition of codes in the future.

Meanwhile, I would be glad if any of you could comment on the list
below: the Foundation for Endangered Languages plans to submit an
application for new codes for some of the larger languages below to
be added to ISO 639-2, based on the following list. There's no
suggestion that these languages are endangered: just that it would be
useful to provide ISO 639 3-letter codes for at least some of them,
and some more information on these languages would be helpful to
present to the ISO 639 Joint Advisory Committee.

The comments that would be particularly useful are:
(a) which of these have some official status in any part of the
    countries concerned:
(b) where there is a significant body of documents in this language, in
    i.   manuscripts
    ii.  academic linguistic transcriptions
    iii. printed materials
    iv.  sound recordings
(c) which of the languages listed below could be regarded as dialects
    of another language, or closely related to another language.
    There is no suggestion that dialects are of less value: it will
    just help in presenting the application to the ISO 639-2
    Maintenance Agency.

In addition the marginal symbol
>>>>    marks other specific queries that I have below.

ISO 639-2 already provides 3-letter codes for many of the larger
languages of the world: I have not repeated those in this current
posting. Thus this current posting is NOT a query on "what languages
are missing" - more of a request for further information on the
languages that are listed.

This list runs broadly from East through West, from China through
Europe. The addition of further major languages of the Americas is
not proposed here, as ISO 639 covers most larger languages of the
Americas fairly well already, although I plan to look at this area
again at a later time,

Please embed your comments with the reply, and send this to
<> rather than to the list.

Again I plan to post a summary to the Linguist List in due course.

John Clews

21 February 2000

- ----------------------------------------------------------
East Asia
- ----------------------------------------------------------

       1,487,000      China           KHAMS                  KHG
       1,480,750      China           DONG, SOUTHERN         KMC

>>>>    ISO 639: no codes for Khams and Dong (it is assumed that
        these are non-Han languages).

>>>>    NB: it will be useful to consult the official lists of
        around 55 national minorities, to check which, if any,
        non-Han languages with official status are omitted from
        ISO 639.

>>>>    What scripts are used for KHAMS and DONG? Latin script?

- ----------------------------------------------------------
Southeast Asia and Oceania
- ----------------------------------------------------------

       1,190,000      Viet Nam        TAY               THO

ISO 639-2 provides only for Tai (other); not for Tay (Tai Tho)

>>>>    How closely related is Tai Tho to other Tai languages?

       2,083,000      Myanmar         ARAKANESE         MHV

ISO 639-2 provides for Karen and Shan; nothing for Arakanese

>>>>    How close is Arakanese to other languages?

- ----------------------------------------------------------

       3,000,000      Indonesia       BANJAR                BJN
                                      Also known as
                                      BANJAR MALAY

       2,700,000      Indonesia       BETAWI                BEW
                                      Also known as
                                      JAKARTA MALAY

>>>>    After checking with Southeast Asian librarians at the
        British Library, it is apparent that these are significantly
        different from Malay. It is not clear whether there is a
        similar situation with Malay languages and Sami languages.

        In Sami, there is now a code for "Northern Sami" (the
        Sami language with the largest population, and the largest
        publishing statistics) and "Sami (other)." The addition of
        further specific Sami languages may be reviewed again later.

        There may be a case for providing a code for "Malay languages
        (other)" as well as particular Malay languages.

       2,000,000      Indonesia       BATAK TOBA      BBC
       1,200,000      Indonesia       BATAK DAIRI     BTD

>>>>    Note: there are various languages called BATAK in Sumatra,
        (which has 1,200,000) speakers, BATAK KARO, BATAK MANDAILING,
        BATAK SIMALUNGUN and BATAK TOBA (which has 2,000,000

>>>>    NB: note also also the different language BATAK in the
        Phillipines, with the SIL code BTK, which is assumed to be
        the "Batak" language encoded in ISO 639-2.

       1,500,000      Indonesia       LAMPUNG               LJP
       1,000,000      Indonesia       REJANG                REJ

ISO 639: No codes for LAMPUNG or REJANG. These too are spoken in
Sumatra, Indonesia.

>>>>    Dialects assumed? Or different languages?

       1,000,000      Phillipines             MADINDANAON     MDH

>>>>    In passing, ISO 639 provides for most other large languages
        of the Philippines.

          50,000      Papua New Guinea        TOK PISIN       PDG

>>>>    ISO 639: prefer to add special code for Tok Pisin? This has
        national status in Papua New Guinea. Currently only "cpe"
        (Creoles & Pidgins, English) is available. However, Bislama
        (which can also be described as an English-based creole
        language) does have a separate code.

>>>>    In passing, ISO 639-2 provides codes for most other larger
        languages of Oceania.

- ----------------------------------------------------------
South Asia
- ----------------------------------------------------------


ISO 639 does not list several of the following, with names as such:

      13,000,000      India           HARYANVI        BGC
       6,000,000      India           KANAUJI         BJJ
       3,500,000      India           PARSI           PRP
       2,730,120      India           LAMBADI         LMN
       2,246,105      India           KHANDESI        KHN
       2,095,280      India           DOGRI-KANGRI    DOJ
       2,081,756      India           GARHWALI        GBM
       2,013,000      India           KUMAUNI         KFY
       1,921,000      India           BAGRI           BGQ
       1,861,965      India           SADRI           SCK
       1,856,000      India           TULU            TCY
       1,600,000      India           BHILI           BHB
       1,544,000      India           WAGDI           WBR
       1,473,000      India           MUNDARI         MUW
       1,295,000      India           NIMADI          NOE
       1,050,000      India           MALVI           MUP
       1,026,000      India           HO              HOC
           3,000      India           BROKSKAT        BKK
                      (Broksat is an Indo-Aryan (Dardic) language)

>>>>    Some dialects assumed in above list?

- ----------------------------------------------------------
For Indian languages, Peter Claus (California State University,
Hayward) also suggests

 - Kodagu (Coorgi) which has a relatively small (but established)
   literature with a number of scholars working on it.

 - Badaga, which has oral texts transliterated by scholars, and

 - Toda, Kota, and Kuruba languages, along the border of Karnataka
   and Tamil Nadu.

- ----------------------------------------------------------
       5,100,000      Bangladesh      SYLHETTI        SYL

>>>>    Widely used in the United Kingdom Bangladeshi community.
        Sylheti Nagri script was used in the past in Bengal.

- ----------------------------------------------------------
      15,015,000      Pakistan        SARAIKI (Siraiki)      SKR
       2,210,000      Pakistan        BRAHUI                 BRH
       1,875,000      Pakistan        HINDKO, NORTHERN       HNO
         625,000      Pakistan        HINDKO, SOUTHERN       HIN

>>>>    Some dialects assumed?

- ----------------------------------------------------------
Dr. Elena Bashir, University of Michigan, also suggests the following
languages which are in SIL:

         333,640     Pakistan        BALTI                   BFT

         320,000     Pakistan        SHINA                   SCL
         222,800     Pakistan        KHOWAR                  KHW
         220,000     Pakistan        KOHISTANI, INDUS        MVY
         200,000     Pakistan        SHINA, KOHISTANI        PLK
         108,000     Afghanistan     PASHAYI, SOUTHWEST      PSH
                -    Afghanistan     PASHAYI, NORTHEAST      AEE
                -    Afghanistan     PASHAYI, NORTHWEST      GLH
                -    Afghanistan     PASHAYI, SOUTHEAST      DRA
           60,000    Pakistan        TORWALI                 TRW
            5,000    Pakistan        DAMELI                  DML
            2,900    Pakistan        KALASHA                 KLS
                                     (Indo-Aryan (Dardic))

           29,000    Pakistan        WAKHI                   WBL
            5,000    Pakistan        YIDGHA                  YDG

              500    Pakistan        DOMAAKI                 DMK

           55,000    Pakistan        BURUSHASKI              BSK

            9,500    Afghanistan     GAWAR-BATI              GWT
            5,000    Afghanistan     GRANGALI                NLI

            2,000    Afghanistan     WOTAPURI-KATARQALAI     WSV
            1,000    Afghanistan     SHUMASHTI               SMS
                -    Afghanistan     TIRAHI                  TRA
                                     (Indo-Aryan (Dardic))

            4,000    Tajikistan      YAZGULYAM               YAH

        4,280,000    Iran            LURI                    LRI
        3,265,000    Iran            MAZANDERANI             MZN
        3,265,000    Iran            GILAKI                  GLK
        1,500,000    Iran            QASHQAI                 QSQ

Dr. Elena Bashir, University of Michigan, also suggests the following
languages which are apparently not in SIL:

                                Gojri        Indo-Aryan
                                Kanyawali    Indo-Aryan (Dardic)
                                Palula       Indo-Aryan (Dardic)
                                Sawi         Indo-Aryan (Dardic)

                                Ishkashmi    Iranian
                                Zebaki       Iranian

- ----------------------------------------------------------
Northern Africa (including the Horn of Africa)
- ----------------------------------------------------------

       3,500,000      Morocco    TAMAZIGHT, CENTRAL ATLAS   TZM

ISO 639 codes Tamashek; check differences from Tamazight and other
languages with similar names (see below and Ethnologue entries)

       3,500,000      Morocco    TACHELHIT       SHI
       2,000,000      Morocco    TARIFIT         RIF
       2,511,000      Mauritania HASSANIYYA      MEY

       1,400,000      Algeria    CHAOUIA         SHY
       1,148,000      Sudan      BEDAWI          BEI

       1,236,637      Ethiopia   GAMO-GOFA-DAWRO GMO
       1,231,673      Ethiopia   WOLAYTTA        WBC

- ----------------------------------------------------------
West Africa (including North-West Africa)
- ----------------------------------------------------------

         600,000      Mali            DOGON                  DOG
         500,000      Mali            SENOUFO, MAMARA        MYK
         361,700      Mali            BOMU                   BMQ
         100,000      Mali            BOSO, SOROGAMA         BZE

         270,000      Mali            TAMASHEQ, KIDAL        TAQ

ISO 639 codes Tamashek; check differences from Tamazight (see above)

+      1,168,500      Mali            FULFULDE, MAASINA      FUL
+      7,611,000      Nigeria         FULFULDE, NIGERIAN     FUV
+        450,000      Niger           FULFULDE,
                                         CENTRAL-EAST NIGER  FUQ

>>>>    ISO 639 codes are "ful" & "ff" - Fulah (Fulfulde/Fulani assumed)
>>>>    Relationship of Fulfulde languages etc. needs clarification.

         640,000      Niger           TAMAJAQ, TAWALLAMMAT   TTQ

>>>>    ISO 639 codes Tamashek; check differences from Tamajaq (see above)

       2,151,000      Niger           ZARMA   DJE

       2,520,000      Burkina Faso    JULA    DYU

       1,500,000      Nigeria         IBIBIO  IBB
       1,000,000      Nigeria         EDO     EDO
       1,000,000      Nigeria         EBIRA   IGB
       1,000,000      Nigeria         ANAANG  ANW

       2,921,300      Senegal         PULAAR           FUC
         313,000      Senegal         JOLA-FOGNY       DYO

       2,900,000      Guinea          FUUTA JALON      FUF

       2,130,000      Cote d'Ivoire   BAOULE           BCI
       1,020,000      Cote d'Ivoire   DAN              DAF

- ----------------------------------------------------------
Eastern and Central Africa
- ----------------------------------------------------------

       2,458,000      Kenya           KALENJIN         KLN
       1,582,000      Kenya           GUSII            GUZ
       1,305,000      Kenya           MERU             MER

       1,300,000      Tanzania        GOGO             GOG
       1,260,000      Tanzania        MAKONDE          KDE
       1,200,000      Tanzania        HAYA             HAY
       1,050,000      Tanzania        NYAKYUSA-NGONDE  NYY

       1,391,442      Uganda          CHIGA            CHG
       1,370,845      Uganda          SOGA             SOG
       1,217,000      Uganda          TESO             TEO

- ----------------------------------------------------------
Central and Southern Africa
- ----------------------------------------------------------

       4,200,000      Congo Dem Rep   KITUBA           KTU
       1,156,800      Congo           MUNUKUTUBA       MKW

       1,004,000      Congo Dem Rep   CHOKWE           CJK
       1,000,000      Congo Dem Rep   SONGE            SOP

>>>>    In passing, no relationship to Tsonga, already in ISO 639

       2,850,000      Mozambique      LOMWE             NGL
       2,500,000      Mozambique      MAKHUWA           VMW
       1,160,000      Mozambique      MAKHUWA-MEETTO    MAK
       1,100,000      Mozambique      SENA              SEH

John Clews

7 February 2000 (updated/corrected 21 February 2000).

John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
tel: +44 1423 888 432; fax: + 44 1423 889061;

Committee Chair of  ISO/TC46/SC2: Conversion of Written Languages;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of CEN/TC304: Information and Communications
 Technologies: European Localization Requirements
Committee Member of TS/1: Terminology (UK national member body of
 ISO/TC37: Terminology)
Committee Member of the Foundation for Endangered Languages;
Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue