[This local archive copy mirrored from the canonical site: http://www.vhg.org.uk/dtd/vhgdtd.html; links may not have complete integrity, so use the canonical document at this URL if possible.]


Annotated VHG DTD (V1.0alpha)

This represents the current VHG DTD, with formatted comments but otherwise essentially unchanged. It should be seen as the master copy of the DTD which can be made by editing the source through: Do NOT edit this file with an HTML editor!

To appreciate the DTD fully, you should be familiar with XML, but you can get a lot by reading the comments. This DTD has delibrately simple content models. The order of elements is usually unimportant, and most components are optional and repeatable. Similarly, there are relatively few attributes.

Everything in this font is comment. XML elements use HTML Heading2. The actual DTD is rendered in fixed font (<PRE>) and can be automatically obtained by editing. Examples use fixed font italic. Within comments, elements are rendered by normal fixed font.

Copyright Virtual HyperGlossary, 1998
This DTD is valid for XML V1.0 documents
The DTD uses XML-LINK syntax of 1998-03; it may change slightly in the final release.

Revision history
19980427: Created first public release (V1.0alpha) (LW/PMR)
19980430: Changes to V1.0alpha before release (PMR)

Changed: Changed 19980510:
Added Added Changed: 19980513:
Changed: 19980528
Minor typos changed

This DTD supports the creation and validation of XML-based glossaries, thesauri and other terminological resources. It uses a small subset of ISO FDIS 12620 ('Computer Applications in Terminology - Data Categories') which are used as XML ELEMENTs. All dataCategorys are identified by their name and the *Notation number* (e.g. A.1.2.3).


The language of a glossary and its entries is very important, and the xml:lang attribute can be added to VHG and termEntry. At present we expect that most glossaries will be monolingual, and will be translated into corresponding monolingual glossaries (e.g. pps-en.xml and pps-fr.xml). Equivalences will be provided by an external XML-LINK (Xlink) database of links between equivalent termEntrys. Note that the xml:lang attribute applies not only to an element but all its children. Therefore a single xml:lang attribute on VHG applies to all elements in the glossary. However if entries are likely to be removed or added it may be valuable to add xml:lang specifically to each termEntry.
<!ENTITY % xml:lang 'xml:lang CDATA #IMPLIED'>


Several glossary components may be defined according to conventions and a convention attribute is provided for the purpose. It is accompanied by an optional href attribute for identifying the convention, either within the glossary or externally. No semantics are defined for this linking, which may use either the value of the URL or the contents of a referenced resource.
<!ENTITY % convention 'convention CDATA #IMPLIED  href  CDATA  #IMPLIED'>


A glossary is composed of a VHG element with termEntry children and administrative information (e.g. date of glossary creation, identity of curators, etc. No constraints are put on the contents of admin. A VHG may also contain link children which may reference a central link database (e.g. of multilingual equivalents). This database itself might be a VHG with only link elements.
<!ELEMENT VHG (admin | termEntry | simpleLink | extendedLink)*>
A VHG may have a descriptive title
A VHG may have a language qualifier
<!ATTLIST VHG %xml:lang;>
A VHG may have a convention
<!ATTLIST VHG %convention;>


A term is represented by a termEntry with an optional term (e.g. to support a thesaurus). If present this must be the first child of any termEntry There are a number of optional and optional-repeatable children. Almost all of these are defined in the data categories of ISO FDIS 12620 (e.g. abbreviation). Note that for dataCategories consisting of more than one word the elementName is constructed in camelCase (e.g. "part of speech" in 12620 is converted to partOfSpeech). The component children can be in any order and repeated any number of times (although some might not make sense if repeated (e.g. partOfSpeech)). Finally a termEntry can contain other termEntrys to set up a hierarchical concept system. An optional dataCategory child determines whether the hierarchy is generic (i.e. the children are narrower terms) or partitive (the children are parts of the parent termEntry) or some more specialised concept (see below)

A term that has other termEntry children may have one of several conceptual relations to its children (A.6) or describe a conceptual Structure (A.7). In general terms 'partitive' corresponds to 'hasA' or containment; 'generic' corresponds to 'isA' or inheritance. 'sequential', 'temporal' and 'spatial' are more specialised.
To extend the power of VHG to identify the relation of a termEntry to its children, we suggest that dataCategorys taken from A.6 and A.7 are used.

<dataCategory type="conceptRelation">partitive</dataCategory>

<!ELEMENT termEntry (
                     (entryID |
                     abbreviation |
                     acronym |
                     admin |
                     dataCategory |
                     definition |
                     example |
                     note |
                     partOfSpeech |
                     see |
                     seeAlso |
                     synonym |
                     indexHeading |
                     sortTerm |
                     searchTerm |
                     simpleLink |
A termEntry *may* have an attribute with name id and type ID. This can be used for indexing, searching, etc. and must be unique within the XML document
<!ATTLIST termEntry id ID  #IMPLIED>
A termEntry may have a descriptive title, which may or may not be the term or the entryID.
<!ATTLIST termEntry   title CDATA #IMPLIED>
A termEntry may have a language qualifier
<!ATTLIST termEntry   %xml:lang;>
A termEntry may have a convention
<!ATTLIST termEntry %convention;>

general attributes

Some elements (e.g. term, abbreviation) have values that may be case-sensitive. The glossary curator may wish to give the searching software hints about whether the case of such an ELEMENT must be retained. Example:
<term>Computer Assisted Tomography</term>
<acronym caseSensitive="yes">CAT</acronym>
<!ENTITY % caseSensitive 'caseSensitive (yes | no) "no"'>
Some elements may be valuable for automatic indexing and markup. In some cases (e.g. when they are short of correspond to common words) the glossary curator may wish to hint that they should not be used in this way. Example:
<abbreviation automaticMarkup="no">A</abbreviation>
<!ENTITY % automaticMarkup 'automaticMarkup (yes | no) "yes"'>


A term contains the character string corresponding to the 'linguistic expression' (A.1). (see example for termEntry). Note that the term is the character string; there is no quantity "term-name".
<!ELEMENT term          (#PCDATA)>
<!ATTLIST term   %caseSensitive;>
<!ATTLIST term   %automaticMarkup;>
A term may have a convention
<!ATTLIST term %convention;>


A termEntry may have a unique identifier (A.10.15). This identifier may alternatively be provided by the id attribute of the termEntry if the system supports XML's ID/IDREF facility. It is recommended for entryID (and required for ID/IDREF) that IDs are case-sensitive and that two IDs should not differ simply by case.
An entryID may have a convention
<!ATTLIST entryID %convention;>


An abbreviated form of a term resulting from the omission of some of its letters (A.
<!ELEMENT abbreviation   (#PCDATA)>
<!ATTLIST abbreviation   %caseSensitive;>
<!ATTLIST abbreviation   %automaticMarkup;>


An abbreviated form of a term made up of letters from the full term [...] (A.
<!ELEMENT acronym   (#PCDATA)>
<!ATTLIST acronym   %caseSensitive;>
<!ATTLIST acronym   %automaticMarkup;>


Administrative information relating either to a complete glossary (VHG) or to an individual termEntry (A.10). We recommend the use of data categories in (A.10) wherever possible using the dataCategory element. Alternatively, well-known XML DTDs or similar systems (such as RDF) can be used to provide (meta)data such as names, addresses, dates, etc. These components can be differentiated through the use of XML's namespace mechanism. Example:
<dataCategory type="originationDate">19980424</dataCategory>
<RDF:assertion> RDF metadata goes here </RDF:assertion>
<!ELEMENT admin          ANY>


The generic mechanism for any data category not provided as an element. The dataCategory type (preferred or admitted (6.1)) should be taken from ISO FDIS 12620 wherever possible as software is likely to provide explicit support for the use of these data categories.
<!ELEMENT dataCategory   (#PCDATA)>
The type of the data category (mandatory)
<!ATTLIST dataCategory type CDATA #REQUIRED>
A dataCategory may have a convention
<!ATTLIST dataCategory %convention;>


The definition of the term, normally in prose. However hypertext (HTML) or other well-known XML systems such as MathML may be used. (A.5.1).
<!ELEMENT definition     ANY>
A definition may have a convention
<!ATTLIST definition %convention;>


Descriptive material that provides a sample of the use of the term (A.5.4) Normally in prose, but hypertext (HTML) or XML from other well-known DTDs may be used as the content.
<!ELEMENT example       ANY>


Supplemental information, (A.8). Normally in prose, but hypertext (HTML) or XML from other well-known DTDs may be used. Since the content will not normally be machine-understandable, other data categories should be used in preference to note if available.
<!ELEMENT note          ANY>


A category assigned to a word based on its grammatical and semantic properties. (A.2.2.1). No controlled vocabulary is required, but it is recommended that words are not abbreviated (e.g. noun, not n).
<!ELEMENT partOfSpeech   (#PCDATA)>


A pointer field used in a terminology collection that does not contain information pointing to the location(s) where information can be found (A.10.18.1). The value of the mandatory href attribute defaults to the value of the referenced id attribute. Example:
<termEntry id="lead1" title="lead (metal)">
<definition>A very dense element</definition>
<example>The joke went down like a lead balloon</example>
<termEntry id="lead_balloon" title="lead balloon">
<term>balloon, lead</term>
<see href="#id(lead1)">lead</see>
<termEntry id="lead2" title="lead (guide)">
<!ENTITY  % see.type  'href CDATA #REQUIRED' >
<!ELEMENT see           (#PCDATA)>
<!ATTLIST see   %see.type;>


A pointer field used in a terminology collection that contains information pointing one or more other location(s) where related information can be found (A.10.18.2). See see for the syntax of the pointer. Example:
<termEntry id="DNA">
<dataCategory type="fullForm">deoxy ribonucleic acid</dataCategory>
<definition>The primary genetic material which is transcribed to RNA</definition>
<termEntry id="RNA">
<dataCategory type="fullForm">ribonucleic acid</dataCategory>
<seeAlso href="#id(DNA)">DNA</seeAlso>
<!ELEMENT seeAlso           (#PCDATA)>
<!ATTLIST seeAlso   %see.type;>


Any term that represents the same or very similar concept to the term (A.2.1.2)
<!ELEMENT synonym       (#PCDATA)>
<!ATTLIST synonym   %caseSensitive;>
<!ATTLIST synonym   %automaticMarkup;>

simpleLinks and extendedLinks/locatorLinks

Hyperlinks supporting the XML-link (XLL, XLink) specification. Xlink is in final draft at present (19980430) and is unlikely to change significantly. (NOTE: 19980510: Eve Maler posted on XML-DEV that a syntax xlink:form might be considered for the final draft).

The VHG's use of links is meant to follow the Xlink specification and philosophy as closely as possible. Since this is still evolving, be prepared for change, but we expect this to be small. If you are unused to full hypermedia (i.e. links with 'more than one end' or 'multidirectional'), you may have to think carefully about what this implies. The spec is honest, but shortish on real examples and the VHG examples may help. A major problem with HTML links is that there is no central registry (database) of links. So if the resource at either end of a link is relocated the link usually breaks. If there is a central registry that is notified of any relocation automatically, it is possible to keep the integrity of link structures.

The VHG provides a somewhat abstract structure for the links since we do not wish to constrain implementations. At present we envisage:

Note that in the Xlink spec extendedLink should not contain simpleLinks (nor, we suspect, nested extendedLinks). The contentSpec of all links is left as ANY in case non-textual material (images, maths, chemistry, etc.) is appropriate.

ISO 12620 (A.10.21.1) defines URL as a hyperlink pointing a unique address whose value conforms to RFC 1738 or RFC 1808. Note that the common usage is 'URL' at present (1998) but XML expects the concept 'URI' to become increasingly important. All addressing should conform to the XML and XML:LINK (Xlink) specs; thus for addressing within a glossary the Xlink fragment connector ('#') will be used.

A VHG may contain one or more links which can be used for referencing related glossaries or a database of extended XLinks. A set of multilingual translations of a glossary might be held as separate glossaries with equivalences held in a central link database. Example:

file: sport-en.xml
<VHG xml:lang="en" title="sport">
<link xml:link="simple" role="equivalences" href="equivalences.xml" title="equivalences"/>
<termEntry id="term1">
<term>World Cup</term>

file: sport-fr.xml
<VHG xml:lang="fr" title="sport">
<link xml:link="simple" role="equivalences" href="equivalences.xml" title="equivalences"/>
<termEntry id="term53">
<term>Coupe du Monde</term>

file: equivalences.xml
<VHG title="multilingual terminology for sport">
<link xml:link="extended" title="World" role="languageEquivalent>
<link xml:link="locator" xml:lang="en" href="sport-en.xml#id(term1)"/>
<link xml:link="locator" xml:lang="fr" href="sport-fr.xml#id(term53)"/>

Note that a link can refer to any entry (or XML-referenceable part of an entry) within a glossary. For example, to reference all synonyms of 'football' we might write:
<link show="new"

link supports collections of links to create xml:link="extended". Thus an extended link could look like:
<link xml:link="extended" title="synonyms of football">
<link xml:link="locator" href="#id(football)DESCENDANT(1,synonym)"/>
<link xml:link="locator" href="#id(football)DESCENDANT(2,synonym)"/>
<link xml:link="locator"

The attributes of link (including the parameter entity representation) are taken directly from the Xlink spec. Note that there is - so far - little experience of Xlink and the current strategy may change.

The following parameter entities (PEs) are taken directly from the Xlink spec. Not all will necessarily be used

<!ENTITY % locator.att    
    "href          CDATA                #REQUIRED">
<!ENTITY % link-semantics.att
    "inline        (true|false)         'true'
     role          CDATA                #IMPLIED">
<!ENTITY % simple-link-semantics.att
    "inline        (true|false)         'true'">
<!ENTITY % remote-resource-semantics.att
    "role          CDATA                #IMPLIED
     title         CDATA                #IMPLIED
     show          (embed|replace|new)  #IMPLIED
     actuate       (auto|user)          #IMPLIED
     behavior      CDATA                #IMPLIED">
<!ENTITY % local-resource-semantics.att
    "content-role  CDATA                #IMPLIED
     content-title CDATA                #IMPLIED">
A link can have a language specified by the xml:lang attribute. The semantics of this are unconstrained in this DTD but are most likely to refer to the target of the link rather than the link itself.


A link pointing to a single target (resource) essentially analogous to HTML's <A HREF="target"> syntax. Its content is deliberately left general (normally an 'inline resource' such as #PCDATA).
<!ELEMENT simpleLink          ANY>
<!ATTLIST simpleLink
      xlink:form   CDATA           #FIXED   "simple"


A container for locatorLink children forming a multi-ended link. Its content is deliberately left general but will always contain locatorLinks . It may contain other components (normally 'inline resources' such as #PCDATA) but should not contain simpleLinks (i.e. (locatorLink | ANY)* if that were allowed and made sense).
<!ELEMENT extendedLink          ANY>
<!ATTLIST extendedLink
      xlink:form     CDATA        #FIXED   "extended"


A link pointing to a single target (resource) as part of a multi-component extendedLink. Its content is deliberately left general but is normally empty.
<!ELEMENT locatorLink          ANY>

Index and markup

For automatic markup and indexing, variants of the term may be required. The use of these is under user control, but the author/curator can give hints as to how these should be used. The following are included in 1.0alpha:


(cf. A.9.5). A term which can be used for indexing an entry, whether or not the indexHeading occurs either as a term, other data category, or within the text of (say) the definition. An indexHeading may or may not be displayed by the software and may or may not be used for markup. [Note: ISO FDIS uses the data categories "Index Heading" and "Index Word". We feel the latter could be misunderstood by non-terminologists as requiring a single word to the exclusion of phrases.]

Examples of categories that could be used as index headings are:

<!ELEMENT indexHeading         (#PCDATA)>
<!ATTLIST indexHeading     %caseSensitive;>
<!ATTLIST indexHeading     %automaticMarkup;>


(A.10.6.2) .From ISO FDIS 12620: 'A character string used for comparisons in sorting and merging
operations. Example: 2,2-Dihydropyran is sorted according to "Dihydropyran", not according to "2,2"'.
<!ELEMENT sortKey         (#PCDATA)>
<!ATTLIST sortKey   %caseSensitive;>
<!ATTLIST sortKey     %automaticMarkup;>


(A.10.6.3). From ISO FDIS 12620: 'A term entered in a term entry for purposes of retrieval.'
<!ELEMENT searchTerm         (#PCDATA)>
<!ATTLIST searchTerm   %caseSensitive;>
<!ATTLIST searchTerm     %automaticMarkup;>

Peter Murray-Rust and Lesley West, 1998