[This local archive copy mirrored from the canonical site: http://www.vhg.org.uk/dtd/components.html ; links may not have complete integrity, so use the canonical document at this URL if possible.]

VHG

Components and Structure of a VHG

Introduction

This section outlines the types of information component which you may find in a hyperglossary. Since the VHG DTD can support a very wide range of terminological objects you may find some of these are unnecessary for your purpose, and there is no need to use them. Alternatively you may  find that some of your requirements are not obviously supported, but in fact this is always possible using the VHG tools. Advanced topics will be explained elsewhere.
 

VHG

The toplevel information object in a hyperglossary is a VHG element. A VHG element consists of three types of component:

Components

The components of a termEntry are described formally in the DTD, but the current section is a general introduction for non-terminologists. We make some simplifications and you should always use the FDIS 12620 reference to know the precise meaning and use of a dataCategory.)

The VHG relies heavily on the emerging ISO FDIS 12620 standard for data categories in terminology, which describes about 300 categories used by terminologists. Terminology requires great precision in the use of words and phrases and for industrial-strength applications you should be careful to use words in a way that is consistent with FDIS 12620. In the current discussion we use only a small subset, and you should not accept the guidance here as the 'right' way to build a hyperglossary.

At the heart of a hyperglossary is the term, contained in a termEntry. Examples of terms are:

Depending on your background and environment you may 'understand' some or all of these. However, none are completely simple. gas is a technical scientific term, but is also widely used in the US and elsewhere for automobile fuel, called petrol in the UK. fish and chips is more than a simple collection of (fried) fish and fried potatoes as it carries the concept of special shops, late-night or seaside 'fast food', etc. democracy is a highly variable term and has many different uses and meanings depending on the country, society and politics it is used in. TNT and trinitrotoluol refer to the same concept. trinitrotoluol is an obsolete scientific term for an explosive, often abbreviated to TNT. It is not, however, an international scientific term such as 1-methyl,2,4,6-trinitrobenzene. trinitrotoluol, 2,4,6-trinitrotoluene, and 1-methyl,2,4,6-trinitrobenzene are approximate synonyms, although their usage is different. [Note that we do not capitalise terms except when it is a formal requirement for the term as in TNT].

Curators will often wish to add strucure to their VHGs. Thus gas in the scientific sense will often be linked to other terms. These might include latent heat, vapour pressure, critical temperature and many more. To systematise this, the curator of a VHG might create a parent termEntry such as vaporisation. For melting phenomena she might create fusion. To unite both of these she might create an even higer level termEntry, phase change. This has a hierarchical structure:

phase change
    vaporisation
        boiling point
        latent heat of vaporisation
        entropy of vaporisation
    fusion
        melting point
        latent heat of fusion
        entropy of fusion

The creation of such classifications requires a great deal of work, technical, organisational and usually political. Ontologies are highly personal, and there are frequently battles over classifications, taxonomies and related approaches. Often they are dynamic, and may have poorly developed terms. For example another curator might create a classification:

states of matter
    gas
        condensation temperature
    liquid
        boiling point
        freezing point
    solid
        melting point

Neither of these is 'better' than the other, and most scientists would find both approaches have some use and some faults. Note that there is some communality between them - boiling point probably represents the same concept in both.

The structure in the two examples above is hierarchical (tree-structured) and thus maps directly onto XML without the need for special elements. The first glossary might be encoded as:

<termEntry>
    <term>phase change</term>
    <termEntry>
        <term>vaporisation</term>
        <termEntry>
            <term>boiling point</term>
        </termEntry>
        <termEntry>
            <term>latent heat of vaporisation</term>
        </termEntry>
        <termEntry>
            <term>entropy of vaporisation</term>
        </termEntry>
    </termEntry>
    <termEntry>
        <term>fusion</term>
        <termEntry>
            <term>melting point</term>
        </termEntry>
        <termEntry>
            <term>latent heat of fusion</term>
        </termEntry>
        <termEntry>
            <term>entropy of fusion</term>
        </termEntry>
    </termEntry>
</termEntry>

The  termEntry  phase change has a child element term and two other child termEntrys.  Some structures cannot be represented by simple hierarchies. Thus it would be impossible to combine the two VHGs above into a single tree. To show that melting point is both associated with fusion and solid we introduce links. The VHG has two built-in links, see and seeAlso, but can also have other links and even links to links. see is used when one term provides all the information for another term and duplication would be expensive to create and maintain. A common instance is for synonyms and abbreviations. seeAlso is used to show related terms. Example:

<termEntry id="tol1">
  <term>toluene</term>
  <synonym>methylbenzene</synonym>
  <definition>A colourless flammable aromatic hydrocarbon.</definition>
</termEntry>
<termEntry id="tnt1">
  <term>trinitrotoluene</term>
  <abbreviation>TNT</abbreviation>
  <definition>An explosive formed by the nitration of toluene</definition>
  <seeAlso href="#id(toluene)">toluene</seeAlso>
</termEntry>
<termEntry id="tnt2">
  <term>TNT</term>
  <see href="#id(tnt1)">trinitrotoluene</see>
</termEntry>

All the information for TNT is contained in trinitrotoluene so reproducing it is unnecessary. In fact, since the VHG allows the easy identification of synonyms the TNT entry could even be omitted. The seeAlso link represents entailment (the relationships between terms in a glossary). Not only does it make navigation easier, but we can use it to create concept maps automatically.

So far we have concentrated on the terms and the structure of the glossary; now we describe the subcomponents. The termEntry encapsulates all the information for a term and we have chosen a small number of the most commonly used FDIS 12620 dataCategories. From a survey of many Web sites we identified about 15 common dataCategories. In scientific, technical and medical (STM) terminology, synonyms, acronyms and abbreviations are common. partOfSpeech is included to support automatic indexing, as is indexHeading. The latter is mainly for software support as are searchTerm and sortKey. definition provides a definitive statement identifying the term and optional additional information for understanding it and its use. example is straightforward; note is a catch-all for other explanatory information. The last three can contain textual and non-textual material (e.g. maths and chemistry). Strictly speaking the definition in a glossary should be kept as small as necessary  to define the term, but the VHG can support dictionaries and encyclopedias if required.

When the VHG does not provide built-in support, the dataCategory is used with its type attribute set to a dataCategory from FDIS 12620. Thus to describe syllabification of a term we might write:

<termEntry id="tol1">
  <term>toluene</term>
  <dataCategory type="syllabification">tol u ene</dataCategory>
</termEntry>

In specialised VHGs (e.g. chemistry) domain-specific dataCategorys may be required. For example, the formula of molecules can be described precisely by the SMILES notation, and we use the convention attribute (with an identifying link if needed):

<termEntry>
  <term>aspirin</term>
  <dataCategory type="SMILES" convention="SMILES" href="http://www.daylight.com">CC(=O)Oc1ccccc1C(=O)O</dataCategory>
  <dataCategory type="formula">acetylsalicylic acid</dataCategory>
</termEntry>

Note that the dataCategory formula is from FDIS12620 and so no convention attribute is required.

PMR/LW 1998-05-28