[This local archive copy mirrored from the canonical site: http://www.vhg.org.uk/dtd/vhgdtd.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
Annotated VHG DTD (V1.0alpha)
This represents the current VHG DTD, with formatted comments but otherwise
essentially unchanged. It should be seen as the master copy of the DTD
which can be made by editing the source through:
Do NOT edit this file with an HTML editor!
adding <!-- at the top of the file
adding --> at the bottom of the file
globally replacing <PRE>< by <PRE>--><
globally replacing ></PRE> by ><!--</PRE>
globally replacing by a space character
To appreciate the DTD fully, you should be familiar with XML, but you can
get a lot by reading the comments. This DTD has delibrately simple content
models. The order of elements is usually unimportant, and most components
are optional and repeatable. Similarly, there are relatively few attributes.
Everything in this font is comment. XML elements use HTML Heading2.
The actual DTD is rendered in fixed font (<PRE>) and can be automatically
obtained by editing. Examples use fixed font italic. Within
comments, elements are rendered by normal fixed font.
DTD for VHG(TM)
Copyright Virtual HyperGlossary, 1998
This DTD is valid for XML V1.0 documents
The DTD uses XML-LINK syntax of 1998-03; it may change slightly in
the final release.
19980427: Created first public release (V1.0alpha) (LW/PMR)
19980430: Changes to V1.0alpha before release (PMR)
contentSpec and attributes to reflect XLink spec
link support for VHG and termEntry (PMR)
see and seeAlso to use href syntax
link syntax to use simpleLink, extendedLink and LocatorLink
indexTerm back to indexHeading (for consistency with the ISO standard)
added convention attribute
Minor typos changed
Released V1.0 alpha for public comment
This DTD supports the creation and validation of XML-based glossaries,
thesauri and other terminological resources. It uses a small subset of
ISO FDIS 12620 ('Computer Applications in Terminology - Data Categories')
which are used as XML ELEMENTs. All dataCategorys are identified
by their name and the *Notation number* (e.g. A.1.2.3).
The language of a glossary and its entries is very important, and the
xml:lang attribute can be added to VHG
and termEntry. At present we expect that most glossaries
will be monolingual, and will be translated into corresponding monolingual
glossaries (e.g. pps-en.xml and pps-fr.xml). Equivalences will be provided
by an external XML-LINK (Xlink) database of links between equivalent termEntrys.
Note that the xml:lang attribute applies not only to an
element but all its children. Therefore a single xml:lang
attribute on VHG applies to all elements in the glossary.
However if entries are likely to be removed or added it may be valuable
to add xml:lang specifically to each termEntry.
<!ENTITY % xml:lang 'xml:lang CDATA #IMPLIED'>
Several glossary components may be defined according to conventions
and a convention attribute is provided for the purpose.
It is accompanied by an optional href attribute for identifying
the convention, either within the glossary or externally. No semantics
are defined for this linking, which may use either the value of the
URL or the contents of a referenced resource.
<!ENTITY % convention 'convention CDATA #IMPLIED href CDATA #IMPLIED'>
A glossary is composed of a VHG element with termEntry
children and administrative information (e.g. date of glossary creation,
identity of curators, etc. No constraints are put on the contents of admin.
A VHG may also contain link children which may reference
a central link database (e.g. of multilingual equivalents). This database
itself might be a VHG with only link elements.
<!ELEMENT VHG (admin | termEntry | simpleLink | extendedLink)*>
A VHG may have a descriptive title
<!ATTLIST VHG title CDATA #IMPLIED>
A VHG may have a language qualifier
<!ATTLIST VHG %xml:lang;>
A VHG may have a convention
<!ATTLIST VHG %convention;>
A term is represented by a termEntry with an optional term
(e.g. to support a thesaurus). If present this must be the first child
of any termEntry There are a number of optional and optional-repeatable
children. Almost all of these are defined in the data categories of ISO
FDIS 12620 (e.g. abbreviation). Note that for dataCategories
consisting of more than one word the elementName is constructed in camelCase
(e.g. "part of speech" in 12620 is converted to partOfSpeech).
The component children can be in any order and repeated any number of times
(although some might not make sense if repeated (e.g. partOfSpeech)).
Finally a termEntry can contain other termEntrys
to set up a hierarchical concept system. An optional dataCategory
child determines whether the hierarchy is generic (i.e. the children
are narrower terms) or partitive (the children are parts
of the parent termEntry) or some more specialised concept
A term that has other termEntry children may have
one of several conceptual relations to its children (A.6) or describe a
conceptual Structure (A.7). In general terms 'partitive' corresponds to
'hasA' or containment; 'generic' corresponds to 'isA' or inheritance. 'sequential',
'temporal' and 'spatial' are more specialised.
To extend the power of VHG to identify the relation of a termEntry
to its children, we suggest that dataCategorys taken from
A.6 and A.7 are used.
<!ELEMENT termEntry (
A termEntry *may* have an attribute with name id
and type ID. This can be used for indexing, searching, etc. and must be
unique within the XML document
<!ATTLIST termEntry id ID #IMPLIED>
A termEntry may have a descriptive title, which may
or may not be the term or the entryID.
<!ATTLIST termEntry title CDATA #IMPLIED>
A termEntry may have a language qualifier
<!ATTLIST termEntry %xml:lang;>
A termEntry may have a convention
<!ATTLIST termEntry %convention;>
Some elements (e.g. term, abbreviation) have values
that may be case-sensitive. The glossary curator may wish to give
the searching software hints about whether the case of such an ELEMENT
must be retained. Example:
<term>Computer Assisted Tomography</term>
<!ENTITY % caseSensitive 'caseSensitive (yes | no) "no"'>
Some elements may be valuable for automatic indexing and markup. In
some cases (e.g. when they are short of correspond to common words)
the glossary curator may wish to hint that they should not be used
in this way. Example:
<!ENTITY % automaticMarkup 'automaticMarkup (yes | no) "yes"'>
A term contains the character string corresponding to
the 'linguistic expression' (A.1). (see example for termEntry).
Note that the term is the character string; there is no quantity
<!ELEMENT term (#PCDATA)>
<!ATTLIST term %caseSensitive;>
<!ATTLIST term %automaticMarkup;>
A term may have a convention
<!ATTLIST term %convention;>
A termEntry may have a unique identifier (A.10.15).
This identifier may alternatively be provided by the id
attribute of the termEntry if the system supports XML's
ID/IDREF facility. It is recommended for entryID (and required
for ID/IDREF) that IDs are case-sensitive and that two IDs should not differ
simply by case.
<!ELEMENT entryID (#PCDATA)>
An entryID may have a convention
<!ATTLIST entryID %convention;>
An abbreviated form of a term resulting from the omission of some of
its letters (A.188.8.131.52)
<!ELEMENT abbreviation (#PCDATA)>
<!ATTLIST abbreviation %caseSensitive;>
<!ATTLIST abbreviation %automaticMarkup;>
An abbreviated form of a term made up of letters from the full term
<!ELEMENT acronym (#PCDATA)>
<!ATTLIST acronym %caseSensitive;>
<!ATTLIST acronym %automaticMarkup;>
Administrative information relating either to a complete glossary (VHG)
or to an individual termEntry (A.10). We recommend the
use of data categories in (A.10) wherever possible using the dataCategory
element. Alternatively, well-known XML DTDs or similar systems (such as
RDF) can be used to provide (meta)data such as names, addresses, dates,
etc. These components can be differentiated through the use of XML's namespace
<RDF:assertion> RDF metadata goes here </RDF:assertion>
<!ELEMENT admin ANY>
The generic mechanism for any data category not provided as an element.
The dataCategory type (preferred or admitted (6.1)) should
be taken from ISO FDIS 12620 wherever possible as software is likely to
provide explicit support for the use of these data categories.
<!ELEMENT dataCategory (#PCDATA)>
The type of the data category (mandatory)
<!ATTLIST dataCategory type CDATA #REQUIRED>
A dataCategory may have a convention
<!ATTLIST dataCategory %convention;>
The definition of the term, normally in prose. However hypertext (HTML)
or other well-known XML systems such as MathML may be used. (A.5.1).
<!ELEMENT definition ANY>
A definition may have a convention
<!ATTLIST definition %convention;>
Descriptive material that provides a sample of the use of the term (A.5.4)
Normally in prose, but hypertext (HTML) or XML from other well-known DTDs
may be used as the content.
<!ELEMENT example ANY>
Supplemental information, (A.8). Normally in prose, but hypertext (HTML)
or XML from other well-known DTDs may be used. Since the content will not
normally be machine-understandable, other data categories should be used
in preference to note if available.
<!ELEMENT note ANY>
A category assigned to a word based on its grammatical and semantic
properties. (A.2.2.1). No controlled vocabulary is required, but it is
recommended that words are not abbreviated (e.g. noun, not n).
<!ELEMENT partOfSpeech (#PCDATA)>
A pointer field used in a terminology collection that does not contain
information pointing to the location(s) where information can be found
(A.10.18.1). The value of the mandatory href attribute
defaults to the value of the referenced id attribute. Example:
<termEntry id="lead1" title="lead (metal)">
<definition>A very dense element</definition>
<example>The joke went down like a lead balloon</example>
<termEntry id="lead_balloon" title="lead balloon">
<termEntry id="lead2" title="lead (guide)">
<!ENTITY % see.type 'href CDATA #REQUIRED' >
<!ELEMENT see (#PCDATA)>
<!ATTLIST see %see.type;>
A pointer field used in a terminology collection that contains information
pointing one or more other location(s) where related information can be
found (A.10.18.2). See see for the syntax of the pointer. Example:
<dataCategory type="fullForm">deoxy ribonucleic acid</dataCategory>
<definition>The primary genetic material which is
transcribed to RNA</definition>
<dataCategory type="fullForm">ribonucleic acid</dataCategory>
<!ELEMENT seeAlso (#PCDATA)>
<!ATTLIST seeAlso %see.type;>
Any term that represents the same or very similar concept to the term
<!ELEMENT synonym (#PCDATA)>
<!ATTLIST synonym %caseSensitive;>
<!ATTLIST synonym %automaticMarkup;>
simpleLinks and extendedLinks/locatorLinks
Hyperlinks supporting the XML-link (XLL, XLink) specification. Xlink
is in final draft at present (19980430) and is unlikely to change significantly.
(NOTE: 19980510: Eve Maler posted on XML-DEV that a syntax xlink:form
might be considered for the final draft).
The VHG's use of links is meant to follow the Xlink specification
and philosophy as closely as possible. Since this is still evolving, be
prepared for change, but we expect this to be small. If you are unused
to full hypermedia (i.e. links with 'more than one end' or 'multidirectional'),
you may have to think carefully about what this implies. The spec is honest,
but shortish on real examples and the VHG examples may help. A major problem
with HTML links is that there is no central registry (database) of links.
So if the resource at either end of a link is relocated the link usually
breaks. If there is a central registry that is notified of any relocation
automatically, it is possible to keep the integrity of link structures.
The VHG provides a somewhat abstract structure for the links since
we do not wish to constrain implementations. At present we envisage:
Note that in the Xlink spec extendedLink should not
contain simpleLinks (nor, we suspect, nested extendedLinks).
The contentSpec of all links is left as ANY in case non-textual material
(images, maths, chemistry, etc.) is appropriate.
simpleLinks between items in a glossary (entailment). We would
expect that software monitored their targets and updated links at regular
occasions (e.g. after an editing session).
simpleLinks to external glossaries outside the curator's control.
These are essentially the same as hyperlinks between documents in HTML
and have no specific control.
extendedLinks to multiple termEntrys. This
is a way of linking entries into concepts. Thus an extendedLink
could use child locatorLinks to reference all entrys
in a given concept, independently of the normal XML tree-structure.
a central database (possibly in a central VHG document) of extendedLinks.
A bi-directional link would be an extendedLink with 2 locatorLink
children. This is a very powerful mechanism and works to link entries in
the same or different glossaries.
ISO 12620 (A.10.21.1) defines URL as a hyperlink pointing a unique
address whose value conforms to RFC 1738 or RFC 1808. Note that the common
usage is 'URL' at present (1998) but XML expects the concept 'URI' to become
increasingly important. All addressing should conform to the XML and XML:LINK
(Xlink) specs; thus for addressing within a glossary the Xlink fragment
connector ('#') will be used.
A VHG may contain one or more links which
can be used for referencing related glossaries or a database of extended
XLinks. A set of multilingual translations of a glossary might be held
as separate glossaries with equivalences held in a central link database.
<VHG xml:lang="en" title="sport">
<link xml:link="simple" role="equivalences"
<VHG xml:lang="fr" title="sport">
<link xml:link="simple" role="equivalences" href="equivalences.xml"
<term>Coupe du Monde</term>
<VHG title="multilingual terminology for sport">
<link xml:link="extended" title="World" role="languageEquivalent>
<link xml:link="locator" xml:lang="en"
<link xml:link="locator" xml:lang="fr"
Note that a link can refer to any entry (or XML-referenceable
part of an entry) within a glossary. For example, to reference all synonyms
of 'football' we might write:
link supports collections of links to create xml:link="extended".
Thus an extended link could look like:
<link xml:link="extended" title="synonyms of football">
<link xml:link="locator" href="#id(football)DESCENDANT(1,synonym)"/>
<link xml:link="locator" href="#id(football)DESCENDANT(2,synonym)"/>
The attributes of link (including the parameter entity
representation) are taken directly from the Xlink spec. Note that there
is - so far - little experience of Xlink and the current strategy may change.
The following parameter entities (PEs) are taken directly from the
Xlink spec. Not all will necessarily be used
<!ENTITY % locator.att
"href CDATA #REQUIRED">
<!ENTITY % link-semantics.att
"inline (true|false) 'true'
role CDATA #IMPLIED">
<!ENTITY % simple-link-semantics.att
"inline (true|false) 'true'">
<!ENTITY % remote-resource-semantics.att
"role CDATA #IMPLIED
title CDATA #IMPLIED
show (embed|replace|new) #IMPLIED
actuate (auto|user) #IMPLIED
behavior CDATA #IMPLIED">
<!ENTITY % local-resource-semantics.att
"content-role CDATA #IMPLIED
content-title CDATA #IMPLIED">
A link can have a language specified by the xml:lang
attribute. The semantics of this are unconstrained in this DTD but are
most likely to refer to the target of the link rather than the link itself.
A link pointing to a single target (resource) essentially analogous
to HTML's <A HREF="target"> syntax. Its content is deliberately left general (normally an 'inline resource' such
<!ELEMENT simpleLink ANY>
xlink:form CDATA #FIXED "simple"
A container for locatorLink children forming a multi-ended link. Its
content is deliberately left general but will always contain locatorLinks
. It may contain other components (normally 'inline resources' such as
#PCDATA) but should not contain simpleLinks (i.e. (locatorLink | ANY)*
if that were allowed and made sense).
<!ELEMENT extendedLink ANY>
xlink:form CDATA #FIXED "extended"
A link pointing to a single target (resource) as part of a multi-component
extendedLink. Its content is deliberately left general but
is normally empty.
<!ELEMENT locatorLink ANY>
xlink:form CDATA #FIXED "locator"
Index and markup
For automatic markup and indexing, variants of the term may be required.
The use of these is under user control, but the author/curator can give hints as
to how these should be used. The following are included in 1.0alpha:
(cf. A.9.5). A term which can be used for indexing an entry, whether
or not the indexHeading occurs either as a term, other
data category, or within the text of (say) the definition. An indexHeading
may or may not be displayed by the software and may or may not be used
for markup. [Note: ISO FDIS uses the data categories "Index Heading" and
"Index Word". We feel the latter could be misunderstood by non-terminologists
as requiring a single word to the exclusion of phrases.]
Examples of categories that could be used as index headings are:
<!ELEMENT indexHeading (#PCDATA)>
<!ATTLIST indexHeading %caseSensitive;>
<!ATTLIST indexHeading %automaticMarkup;>
(A.10.6.2) .From ISO FDIS 12620: 'A character string used for comparisons
in sorting and merging
operations. Example: 2,2-Dihydropyran is sorted according to "Dihydropyran",
not according to "2,2"'.
<!ELEMENT sortKey (#PCDATA)>
<!ATTLIST sortKey %caseSensitive;>
<!ATTLIST sortKey %automaticMarkup;>
(A.10.6.3). From ISO FDIS 12620: 'A term entered in a term entry for
purposes of retrieval.'
<!ELEMENT searchTerm (#PCDATA)>
<!ATTLIST searchTerm %caseSensitive;>
<!ATTLIST searchTerm %automaticMarkup;>
Peter Murray-Rust and Lesley West, 1998