[Archive copy mirrored from: http://www.ornl.gov/sgml/wg8/9573ent/ENTITIES.HTM, descriptive test only]

Sample collections of entities and glyphs (proposed) for potential inclusion into ISO 9573. For: Ugaritic, Old Persian, Glagolitic, Croatian, Buginese, Cherokee, Gothic Uncials. Developed by Anders Berglund (and others).

SGML Public Entity Sets, Proposals

What is an SGML Entity and a Public Entity Set?

SGML, ISO/IEC 8879:1986, contains a mechanism to refer to characters, syllables and symbols that are not to be found on normal keyboards or that are difficult to store and transmit unambigously. It is acheived by defining so called (SDATA) Entities, where one has essentially given a name to a character, syllable or symbol and is assuming that a system processing the SGML data will be able to understand the reference, either by its name or the so called replacement text. To refer to an entity in an SGML file the name is prefixed by "&" and followed by ";". For example α to refer to the greek alpha. ISO has published some number of collections of entities; the Public Entity Sets, and work is in progress to add a large number of entity sets for non-latin languages.

For the purposes of reviewing and commenting on the sets the name and comment are the only relevant parts. The pubished entity sets also refer to characters, if present, in ISO 10646 as well as to entries in the International Glyph Registry, for which AFII is the registrar.

What is the Repertoire in a Set

A large number of the entities represent characters. For cases where presentation forms exist and where it is desirable to be able to easily refer to a particular form entities have been created for these. For example, up to five entities have been defined for each Arabic letter - one as a character, four when it is required to be able to specify one of the four presentation forms.

For entity sets representing scripts of scholarly interest additional entities are included to enable recording of variations that are important for research purposes. In such cases there is normally a "nominal" entity representing a character or syllable that can be used to record texts where variations are not important. In addition there are entities for each signifficant variation of a character or syllable that may be used in those studies where variations are important to record. Thus for example if a character has two distinct presentation forms there would normally be three entities for it.

Guidelines for the different parts of an entity definition

  1. Entity name - in practice this is what is "standardized". The guidelines for the name is:
  2. replacement text - in the canonical form just "[" || entity name || "]"
  3. comment in entity declaration - some meaningful description of the letter/ syllable (so someone could look at a font and pick out the right glyph - requirement of knowledge of the subject permitted). For political reasons this comment is selected from (in order of preference):


Be warned that the Web page for a proposal contains a number of gif images showing a typical glyph for each entity. Display may thus be slow...

The proposed entity sets will, shortly, also be available as a zip file containing a scanned tif image of the proposal.

Please send comments on the proposals to Anders Berglund; bcatf@ibm.net.


Ugaritic Proposal, HTML

Ugaritic Proposal, zipfile

Old Persian

Old Persian Proposal, HTML

Old Persian Proposal, zipfile

Glagolitic, Croatian

Glagolitic, Croatian Proposal, HTML

Glagolitic, Croatian Proposal, zipfile


Buginese Proposal, HTML

Buginese Proposal, zipfile


Cherokee Proposal, HTML

Cherokee Proposal, zipfile

Gothic Uncials

Gothic Uncials Proposal, HTML

Gothic Uncials Proposal, zipfile

Copyright BC&TF, 1997.