First draft of proposed XML TC for Unicode 3.0
Date: Tue, 7 Sep 1999 17:44:16 -0400 (EDT) From: John Cowan <cowan@locke.ccil.org> To: xml-dev@ic.ac.uk Subject: First draft of proposed XML TC for Unicode 3.0 (unofficial)
This is version 0.1 of a proposed technical corrigendum to XML 1.0 to incorporate the new characters of Unicode 3.0 into the allowable sets used in XML Names. It presumes that XML should not remain limited to an obsolete version of the Unicode and ISO 10646 standards. The new scripts handled are: Cherokee, Ethiopic, Khmer, Mongolian, Myanmar, Ogham, Runic, Syriac, Thaana, Unified Canadian Aboriginal Syllabics, Yi. These lists of new characters were constructed by using the current Unicode 3.0 data file from the Unicode Consortium and applying the rules given in Appendix B to it. This version of the proposal does not yet incorporate information from the Unicode 3.0 properties list. (Unicode 3.0 is technically still in beta, but the character list has been frozen for months now.) New BaseChars (BNF rule 85): [#x01F6-#x01F9] /* new Latin letters */ | [#x0218-#x021F] | [#x0222-#x0233] | [#x02A9-#x02AD] /* new IPA Latin letters */ | #x03D7 /* new Greek letters */ | #x03DB | #x03DD | #x03DF | #x03E1 | #x0400 /* new Cyrillic letters */ | #x040D | #x0450 | #x045D | [#x048C-#x048F] | [#x04EC-#x04ED] | [#x06B8-#x06B9] /* new Arabic letters */ | #x06BF | #x06CF | [#x06FA-#x06FC] | #x0710 /* new Syriac script */ | [#x0712-#x072C] | [#x0780-#x07A5] /* new Thaana script */ | #x0950 /* OM letters */ | #x0AD0 | [#x0D85-#x0D96] /* new Sinhala script */ | [#x0D9A-#x0DB1] | [#x0DB3-#x0DBB] | #x0DBD | [#x0DC0-#x0DC6] | #x0E2F / * new Thai characters */ | #x0EAF | #x0F00 /* Tibetan OM */ | #x0F6A /* new Tibetan letters */ | [#x1000-#x1021] /* new Myanmar script */ | [#x1023-#x1027] | [#x1029-#x102A] | [#x1050-#x1055] | #x1101 /* Hangul jamo that are no longer compatibility characters */ | #x1104 | #x1108 | #x110A | #x110D | [#x1113-#x113B] | #x113D | #x113F | [#x1141-#x114B] | #x114D | #x114F | [#x1151-#x1153] | [#x1156-#x1158] | #x1162 | #x1164 | #x1166 | #x1168 | [#x116A-#x116C] | [#x116F-#x1171] | #x1174 | [#x1176-#x119D] | [#x119F-#x11A2] | [#x11A9-#x11AA] | [#x11AC-#x11AD] | [#x11B0-#x11B6] | #x11B9 | #x11BB | [#x11C3-#x11EA] | [#x11EC-#x11EF] | [#x11F1-#x11F8] | [#x1200-#x1206] /* new Ethiopic script */ | [#x1208-#x1246] | #x1248 | [#x124A-#x124D] | [#x1250-#x1256] | #x1258 | [#x125A-#x125D] | [#x1260-#x1286] | #x1288 | [#x128A-#x128D] | [#x1290-#x12AE] | #x12B0 | [#x12B2-#x12B5] | [#x12B8-#x12BE] | #x12C0 | [#x12C2-#x12C5] | [#x12C8-#x12CE] | [#x12D0-#x12D6] | [#x12D8-#x12EE] | [#x12F0-#x130E] | #x1310 | [#x1312-#x1315] | [#x1318-#x131E] | [#x1320-#x1346] | [#x1348-#x135A] | [#x13A0-#x13F4] /* new Cherokee script */ | [#x1401-#x166C] /* new Canadian Syllabics script */ | [#x166F-#x1676] | [#x1681-#x169A] /* new Ogham script */ | [#x16A0-#x16EA] /* new Runic script */ | [#x1780-#x17B3] /* new Khmer script */ | [#x1820-#x1842] /* new Mongolian script */ | [#x1844-#x1877] | [#x1880-#x18A8] | #x3006 /* Ideographic closing mark */ | [#x31A0-#x31B7] /* new Bopomofo letters */ | [#xA000-#xA48C] /* new Yi script */ IMHO none of these are controversial except perhaps the Hangul jamo. Formerly, some Hangul jamo had compatibility decompositions into sequences of other Hangul jamo. These decompositions have been removed from the Unicode Standard (actually in 2.1), so the jamo should now be allowed in XML names in accordance with the rules in Appendix B. New Ideographics (BNF rule 86): [#x3400-#x4DB5] /* CJK Ideograph Extension A */ New CombiningChars (BNF rule 87): [#x0346-#x034E] /* new IPA combining characters */ | #x0362 | [#x0488-#x0489] /* new Cyrillic combining characters */ | [#x0653-#x0655] /* new Arabic combining characters */ | #x0711 /* combining characters for new Syriac script */ | [#x0730-#x074A] | [#x07A6-#x07B0] /* combining characters for new Thaana script */ | [#x0D82-#x0D83] /* combining characters for new Sinhala script */ | #x0DCA | [#x0DCF-#x0DD4] | #x0DD6 | [#x0DD8-#x0DDF] | [#x0DF2-#x0DF3] | #x0F96 /* new Tibetan subjoined letters */ | [#x0FAE-#x0FB0] | #x0FB8 | [#x0FBA-#x0FBC] | #x0FC6 /* new Tibetan combining character */ | [#x102C-#x1032] /* combining characters for new Myanmar script */ | [#x1036-#x1039] | [#x1056-#x1059] | [#x17B4-#x17D3] /* combining characters for new Khmer script */ | #x18A9 /* combining character for new Mongolian script */ | [#x20E2-#x20E3] /* new general combining characters */ IMHO none of these are controversial except perhaps the #x20E2 and #x20E3, which are primarily intended for use with symbol characters, and therefore should perhaps be excluded as #x20DD-#x20E0 are. New Digits (BNF rule 88): [#x1040-#x1049] /* digits for new Myanmar script */ | [#x1369-#x1371] /* digits for new Ethiopic script */ | [#x17E0-#x17E9] /* digits for new Khmer script */ | [#x1810-#x1819] /* digits for new Mongolian script */ IMHO none of these will be controversial. New Extenders (BNF rule 89): #x02EE /* Modifier letter double apostrophe */ | #x1843 /* Modifier letter for new Mongolian script */ IMHO none of these will be controversial. In addition, the following characters no longer pass the tests given in Appendix B for valid name or name-start characters, but should remain legal in XML names for backward compatibility, and therefore should be explicitly enumerated in the corrigendum: 03D0;GREEK BETA SYMBOL 03D1;GREEK THETA SYMBOL 03D2;GREEK UPSILON WITH HOOK SYMBOL 03D5;GREEK PHI SYMBOL 03D6;GREEK PI SYMBOL 03F0;GREEK KAPPA SYMBOL 03F1;GREEK RHO SYMBOL 03F2;GREEK LUNATE SIGMA SYMBOL 0675;ARABIC LETTER HIGH HAMZA ALEF 0676;ARABIC LETTER HIGH HAMZA WAW 0677;ARABIC LETTER U WITH HAMZA ABOVE 0678;ARABIC LETTER HIGH HAMZA YEH 0E33;THAI CHARACTER SARA AM 0EB3;LAO VOWEL SIGN AM 0F77;TIBETAN VOWEL SIGN VOCALIC RR 0F79;TIBETAN VOWEL SIGN VOCALIC LL 1E9A;LATIN SMALL LETTER A WITH RIGHT HALF RING 212E;ESTIMATED SYMBOL ###
John Cowan cowan@ccil.org I am a member of a civilization. --David Brin xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
Prepared by Robin Cover for the The SGML/XML Web Page archive.