[This local archive copy mirrored from the canonical site: http://www.vhg.org.uk/dtd/components.html ; links may not have complete integrity, so use the canonical document at this URL if possible.]
The VHG relies heavily on the emerging ISO FDIS 12620 standard for data categories in terminology, which describes about 300 categories used by terminologists. Terminology requires great precision in the use of words and phrases and for industrial-strength applications you should be careful to use words in a way that is consistent with FDIS 12620. In the current discussion we use only a small subset, and you should not accept the guidance here as the 'right' way to build a hyperglossary.
At the heart of a hyperglossary is the term, contained in a termEntry. Examples of terms are:
Curators will often wish to add strucure to their VHGs. Thus gas in the scientific sense will often be linked to other terms. These might include latent heat, vapour pressure, critical temperature and many more. To systematise this, the curator of a VHG might create a parent termEntry such as vaporisation. For melting phenomena she might create fusion. To unite both of these she might create an even higer level termEntry, phase change. This has a hierarchical structure:
phase change
vaporisation
boiling point
latent heat of vaporisation
entropy of vaporisation
fusion
melting point
latent heat of fusion
entropy of fusion
The creation of such classifications requires a great deal of work, technical, organisational and usually political. Ontologies are highly personal, and there are frequently battles over classifications, taxonomies and related approaches. Often they are dynamic, and may have poorly developed terms. For example another curator might create a classification:
states of matter
gas
condensation temperature
liquid
boiling point
freezing point
solid
melting point
Neither of these is 'better' than the other, and most scientists would find both approaches have some use and some faults. Note that there is some communality between them - boiling point probably represents the same concept in both.
The structure in the two examples above is hierarchical (tree-structured) and thus maps directly onto XML without the need for special elements. The first glossary might be encoded as:
<termEntry>
<term>phase change</term>
<termEntry>
<term>vaporisation</term>
<termEntry>
<term>boiling point</term>
</termEntry>
<termEntry>
<term>latent heat of vaporisation</term>
</termEntry>
<termEntry>
<term>entropy of vaporisation</term>
</termEntry>
</termEntry>
<termEntry>
<term>fusion</term>
<termEntry>
<term>melting point</term>
</termEntry>
<termEntry>
<term>latent heat of fusion</term>
</termEntry>
<termEntry>
<term>entropy of fusion</term>
</termEntry>
</termEntry>
</termEntry>
The termEntry phase change has a child element term and two other child termEntrys. Some structures cannot be represented by simple hierarchies. Thus it would be impossible to combine the two VHGs above into a single tree. To show that melting point is both associated with fusion and solid we introduce links. The VHG has two built-in links, see and seeAlso, but can also have other links and even links to links. see is used when one term provides all the information for another term and duplication would be expensive to create and maintain. A common instance is for synonyms and abbreviations. seeAlso is used to show related terms. Example:
<termEntry id="tol1">
<term>toluene</term>
<synonym>methylbenzene</synonym>
<definition>A colourless flammable aromatic hydrocarbon.</definition>
</termEntry>
<termEntry id="tnt1">
<term>trinitrotoluene</term>
<abbreviation>TNT</abbreviation>
<definition>An explosive formed by the nitration of toluene</definition>
<seeAlso href="#id(toluene)">toluene</seeAlso>
</termEntry>
<termEntry id="tnt2">
<term>TNT</term>
<see href="#id(tnt1)">trinitrotoluene</see>
</termEntry>
All the information for TNT is contained in trinitrotoluene so reproducing it is unnecessary. In fact, since the VHG allows the easy identification of synonyms the TNT entry could even be omitted. The seeAlso link represents entailment (the relationships between terms in a glossary). Not only does it make navigation easier, but we can use it to create concept maps automatically.
So far we have concentrated on the terms and the structure of the glossary; now we describe the subcomponents. The termEntry encapsulates all the information for a term and we have chosen a small number of the most commonly used FDIS 12620 dataCategories. From a survey of many Web sites we identified about 15 common dataCategories. In scientific, technical and medical (STM) terminology, synonyms, acronyms and abbreviations are common. partOfSpeech is included to support automatic indexing, as is indexHeading. The latter is mainly for software support as are searchTerm and sortKey. definition provides a definitive statement identifying the term and optional additional information for understanding it and its use. example is straightforward; note is a catch-all for other explanatory information. The last three can contain textual and non-textual material (e.g. maths and chemistry). Strictly speaking the definition in a glossary should be kept as small as necessary to define the term, but the VHG can support dictionaries and encyclopedias if required.
When the VHG does not provide built-in support, the dataCategory is used with its type attribute set to a dataCategory from FDIS 12620. Thus to describe syllabification of a term we might write:
<termEntry id="tol1">
<term>toluene</term>
<dataCategory type="syllabification">tol u ene</dataCategory>
</termEntry>
In specialised VHGs (e.g. chemistry) domain-specific dataCategorys may be required. For example, the formula of molecules can be described precisely by the SMILES notation, and we use the convention attribute (with an identifying link if needed):
<termEntry>
<term>aspirin</term>
<dataCategory type="SMILES" convention="SMILES" href="http://www.daylight.com">CC(=O)Oc1ccccc1C(=O)O</dataCategory>
<dataCategory type="formula">acetylsalicylic acid</dataCategory>
</termEntry>
Note that the dataCategory formula is from FDIS12620 and so no convention attribute is required.
PMR/LW 1998-05-28