[Mirrored from: http://www.ornl.gov/sgml/wg8/document/1860.htm]

WG8 N 1860

ISO/IEC JTC1/SC18/WG8

Document Processing and Relating Communication --

Document Description and Processing Languages

Title:: Information Processing -- SGML Applications -- Topic Navigation Maps
Source:: ISO JTC1/SC18/WG8
Project:: SGML Rapporteur Group
Status of Document:: Committee Draft for approval with as part of NP ballot
Requested action:: Approval of NP and Committee Draft
Summary of major points:: A new work item for development of topic navigation map SGML architectural forms, for which user requirements have been defined by SC18/WG8, is being submitted to national bodies for approval as a new work item.
This document explains the role of the proposed architectural forms, and formally defines the main architectural forms required.
Distribution:: National bodies participating in or observing the activities of JTC1 SC18/WG8

Information Processing -- SGML Applications -- Topic Navigation Maps

Scope

This standard provides a mechanism, based on techniques defined in ISO/IEC 10744:1992, for identifying information objects that share a common topic. It can also be used to define the relationships between sets of related topics. It can be used to define:

tables of contents and subject indexes for individual documents, or related sets of documents
glossaries that can be shared by more than one document
the relationship between topics within a thesaurus
the relationships between multilingual thesauri, glossaries, etc.

Related Standards

ISO 8879:1986
ISO/IEC 10744:1992

Definitions

TBD

Purpose of the Topic Navigation Map Module

The purpose of this Topic Navigation Map module is to facilitate the maintainability and usability of topic-based navigational aids for large corpora of documents containing interrelated information. The fundamental idea is to make a distinction between highly concentrated and independent topic maps -- sets of relations between the topics covered in a given corpus -- and the addresses of relevant information within the corpora themselves. Such topic maps can improve the accessibility of information, and they can facilitate and, to some extent, automate the task of providing, and imposing editorial consistency and maintainability, on navigational resources. The design of topic maps allows the groupware-supported production of the data from which navigational aids such as indexes, glossaries, tables of contents, lists and catalogs can be generated. It can also be used to enhance the navigability of very large information bases.

This Topic Navigation Map module provides a basis for creating and maintaining information that, in effect, classifies the information in documents according to topic, and classifies topics with respect to each other. It is intended to help increase consistency and decrease redundancy not only in navigational aids within documents, but also in navigational aids used with multiple documents, such as master indexes. The discipline that can be imposed by using the Topic Navigation Map module will also assist those who create and/or collect libraries of documents, and who then wish to provide a given collection with a unified, consistent, and minimally redundant topic index.

The Standard Generalized Markup Language (SGML) defined in ISO 8879:1986 allows all kinds of documents to become databases. For this facility to be useful there must be ways to navigate data stores so that parts of documents that are relevant to a particular topic can be easily found and organized rapidly by machine. However, the number and complexity of indexable topics and the relationships between them in all documents greatly exceeds the number and complexity of relations normally represented in traditional databases, or, for that matter, in the kinds of indexes normally found in books. In fact, the number of topic relationships that might usefully be represented with respect to any reasonably large collection of documents is, for all practical purposes, limitless. Moreover, even in archived documents, new kinds of topic relationships can be expected to appear from time to time.

Creating and maintaining topic indexes is a difficult and expensive proposition. Creating a topic index is a complex task, like planning and building a building, involving myriad assumptions and artistic decisions. Many indexes are indexes in name only: ramshackle affairs that are unable to bear the stress of the everyday purposes for which indexes are presumably intended, they are essentially almost useless. All too often, however, even when an index is well thought out, well constructed, and useful, little thought is given to its maintainability. When the time comes to create an updated or corrected index, the original documentation for the topic architecture of the index is no longer available. Indeed, it may never have existed or have been consciously expressed in any abstract way. Even an index on which enormous maintenance effort is expended can quite easily become a self-inconsistent hodgepodge, especially when the size of the indexing task dictates that it must be a cooperative effort, or when there have been changes in the responsible personnel.

An application-neutral, internationally understandable, rigorous, and yet flexible and open way to represent topical indexes, such as the one set forth in this Topic Navigation Map module, can help to make indexes easier to make, easier to maintain, and easier to use. As new relationships are discovered and included as part of the topic architecture, the architecture changes. Many architects may have to collaborate and contribute, over the years, to an evolving architecture, which at any given time must unambiguously and comprehensibly govern all maintenance activities. Unless those who are adding and/or maintaining anchors have clear guidance, the instantiation of that architecture -- the index itself -- may become unsound and unsafe.

A topic architecture fundamentally consists of topics and the relations that they bear to one another. There is need, therefore, for a way to permit:

any number of topics to be defined by those with knowledge of the subject matter,
any number of categories of topics, with subcategorization to any level,
any number of relations between topics, and
any number of categories of relations between topics, with subcategorization to any level

to be represented, universally interchanged, processed, merged, and used for data navigation. An international standard for representing (among many other things) arbitrary relationships between arbitrary pieces of information wherever they are in situ, exists in ISO/IEC 10744, which defines the Hypermedia/Time-based Structuring Language known as HyTime. In this Topic Navigation Map module a HyTime-based approach of linking topics with information has been developed, and an architecture is defined that can support applications that provide:

the ability for many experts in a given field of knowledge to share in, and jointly contribute to, the evolution of a common map of topic relationships in each given field of knowledge;
the ability to merge such maps, whenever multiple fields of knowledge must be used simultaneously, in such a way as to maximize the meaningful cross-connections between them; and
the ability to use such maps in a variety of ways for a variety of purposes, such as extracting printed and online indexes and glossaries for particular documents. Extracted indexes are able to reflect the relationships between topics and subtopics represented by maps of topic relationships, and are extractable automatically or semi-automatically from the map of topic relationships as part of a formatting, pre-formatting, and/or authoring process.

Using this Topic Navigation Map module, a particular topic architecture, designed for some document, some set of documents, or even for an entire field of knowledge, can be represented in a topic map. A topic map consists of a set of topics and a set of topic relationships. Topics are defined using CApH.semanticAssignment-form elements whose roles are defined by the user, and CApH.topicRelation-form elements that identify specific relations between topics. Categories of topics may be iteractively identified and described by linking suitable topics to other topics belonging to the category.

A topic is created by linking, using a HyTime independent link, several pieces of information about a topic through a semantic assignment link. A topic can be defined by assigning an anchrole attribute to the link's definition: whatever anchor corresponds to the definition in the anchrole attribute, if any, is therefore considered as the definition of the topic. This notion of definition is very general: a definition can be any portion of information (no specific internal structure needed) that is pointed to.

Semantic Assignment -- CApH.semanticAssignment

A semantic assignment (CApH.semanticAssignment) is a specialized HyTime independent link (ilink) that associates all the information objects sharing a common semantic. This group of objects is collectively called a topic. The located objects have the common property of being anchors of a semantic assignment element. Therefore, one can distinguish:

the semantic assignment; an SGML element that associates all the related objects with anchors. Each anchor can itself be an aggregate, as there is no CApH-imposed limit on the nature of the addressing mechanism used to address the anchors. Any anchor can also be shared by different semantic assignments.
the topic itself, that includes the semantic assignment element and all of its anchors. The term topic is to be understood as a subject (as in a subject index) as well as a location, the greek word topos from which it originates meaning location. In other words, a topic is a composite object made of several elementary locations about a subject.

Common examples of topics are index and/or glossary entries: an index entry is a set of locations sharing common semantics described by the term that is displayed in the index; they are normalyy displayed in alphabetical order. A glossary entry is a topic that points to an occurrence considered as its definition. CApH enables topics that play at the same time the role of index and glossary entries: one of their occurrence roles is their definition, the others being the equivalent of index entries.

The value of the HyTime anchrole attribute is user-definable and allows the user to distinguish between different roles of occurrence sets. The only constraint imposed by CApH is that the first anchor be the semantic assignment. This is the only way to enable the link to be referred to. All other declared anchors can be aggregate anchors.

When a semantic assignment is instantiated, its anchrole values have to be explicitely defined. The role of each anchor is to specify the nature of the occurrence where the information about a given topic is to be found. These anchor addresses are called "occurrence roles". There is no limit to what can be represented and distinguished as occurrence roles, nor to the number of occurrence roles. The only HyTime limitation is that occurrence roles are fixed in the DTD for a given semantic assignment element type. It is entirely the realm of the application to decide what to do when all anchors are not filled in. (A "null" address could be interpreted as no occurrence, for example.) The purpose of differentiating between different kinds of occurrence roles is to help users distinguish between different kinds of targets and navigate with more precision in a large set of information objects.

A semantic assignment can be used to instantiate as many element types as desired in an actual SGML document type defintion (DTD), allowing for a finer distinction; each semantic assignment element type can have a different set of anchors described in the anchrole attribute.

Endterm values can be associated, by the user, with any instantiated element to allow the application to display information that enhances the understanding of information to be found at the anchor. Index subentries found in printed indexes do sometimes play such a role of specializing under a given topic. HyTime applications can use the endterm attribute to display this information to users. Used in the context of a CApH application, the endterm values point to information whose purpose is to clarify a semantic title, without adding any extra structural level.

Anchor aggregation may be given special significance in a given derived type as long as the basic meaning of the CApH form remains intact. HyTime's aggregate traversal (aggtrav) attribute may be used with agglink aggregate anchors independently or as an enhancement of the meaning of a derived type of an instance.

There is no requirement that the value of the aggloc attribute of an aggregate anchor of a CApH.semanticAssignment be agglink; it could be aggloc instead. Moreover, there is no requirement that any anchor be an aggregate at all. (In such cases, from a HyTime perspective, the value of the aggtrav attribute is irrelevant.)

The first anchor, called the topic anchor, must identify the CApH.semanticAssignment element itself. Making the semantic assignment one of its own anchors permits users to traverse from some other link (if any) to the semantic assignment, and thence to either (or part of, or all) of the semantic assignment's other anchors. HyTime link traversal is possible only between the anchors of a link; there is no implicit traversability between a link and its anchor addresses unless it is itself one of those anchor addresses (anchaddr).

Specification of the link in an anchaddr attribute value can be defaulted. In HyTime links, if there is one more anchor indicated by the anchrole attribute than are actually specified via the anchaddr attribute, the first anchor is always understood to be the link itself; i.e. the link is the missing anchor.

Each of the other anchors (collectively called occurrences) may identify any number of information objects. The full power of HyTime's information addressing facilities can be used to associate semantic definitions with literally any pieces of information, identified by whatever structural, contextual, semantic properties, or other means are convenient.

The CApH-specific mnemonic attribute allows a brief single-token name to be given to the semantic definition.

The CApH-specific semanticUniverses attribute specifies the semantic context(s) in which the definition is valid. The generic identifier of a semantic assignment element constitutes implicitely a universe. Other tokens may be added to the default one as values of the semanticUniverses attribute, as there is no limit to the number of universes attached to any instance of an element.

Depending on the application, the user can choose to constrain the tokens used in the semantic universes within a predefined list, shared by a community of users. The semantic universe can be described as a HyTime-defined parsing context (parsecxt). A CApH-aware application will allow users to filter those objects belonging to one, or several, universes, and discard remaining elements, as if they did not exist, using the omitprop attribute of the parsecxt definition. This feature helps authors and editors of hyperdocuments to create and maintain concurrent universes while giving users access to a known set of universes. The possibility of maintaining a unique hyperdocument while allowing several views on it should considerably enhance its maintainability.

A CApH engine must be able to suppress an element with a CApH-defined semanticUniverses attribute. It is processed in any context in which none of the universes specified by the token list is found in the value of the semanticUniverses attribute, so that the element can be disqualified. The question of what it means, in any particular case, for information to be disqualified is entirely the realm of the application. In general, though, the purpose of disqualification by semantic universe is to avoid wasting the user's time and attention on irrelevant information. It is the responsability of the application to inform the CApH application whenever semantic universes (parsing contexts) become valid or invalid due to changes in user context; this minimizes transmission of unwanted information. (In some applications, a user can say that all universes are always valid, and then see everything. In other applications, universes can be used for separating access levels depending on the degree of classification for different parts of the document, as defined by the hyperdocument editor, but can not be modified by individual end-users.)

The CApH application is responsible for maintaining a namespace of universes for each mnemonic, and a namespace of mnemonics for each universe. Given a mnemonic, a CApH engine that supports the semantic assignments must be able to provide a comprehensive list of all mnemonics declared in semantic assignments in the bounded object set (BOS)

<!element CApH.semanticAssignment
                                  -- associates portions of
                                     information sharing
                                     a common semantic.     --
                           - O   (semanticTitle*) >
<!attlist CApH.semanticAssignment

    CApH        NAME        CApH.semanticAssignment

    id          ID          #IMPLIED
                  -- CApH strongly encourages the id of a CApH.semanticAssignment
                     element to be present, in order to use this topic as an
                     anchor for a topic relation link. As all topics must
                     not be anchors, the id is not required. --

    HyTime      NAME        ilink

    mnemonic      -- the short or key name for the subject matter of
                     this definition; machine-processable identifier. Can be seen 
                     as a "semantically-loaded identifier" 
                     (which may or may not be unique)  --
                CDATA       #IMPLIED

    semanticUniverses
                  -- Defines the semantic universe in which this topic is
                     useful. This attribute is generally used to filter out the
                     non-relevant topic according to a list of universes chosen
                     by the user. --

    anchrole    NAMES       #FIXED
                      "Topic OccurrenceRole_1 #AGG... OccurrenceRole_n #AGG"
                  -- The number of anchroles is not specified in the
                     architectural form because it is application-specific.--
                     The anchors can generally be aggregates (#AGG), although
                     this is not required by CApH, if some application
                     needs to specify an anchor role in which the address
                     is not the address of an aggregate. --

    anchaddr      -- Anchor addresses. (was "linkends").
                     Constraint: one anchor per anchor role. --
                  -- CApH constraint: "Topic" anchor must be the
                     link itself. --
                IDREFS      #REQUIRED

    extra         -- External access traversal rule --
                  -- Constraint: one per anchor or one for all --
                  -- Lextype(("E"|"I"|"A"|"N"|"P")+) --
                NAMES    #IMPLIED  -- Default: no HyTime traversal --
    intra         -- Internal access traversal rule --
                  -- Constraint: one per anchor or one for all --
                  -- Lextype(("E"|"I"|"A"|"N"|"P")+) --
                NAMES    #IMPLIED  -- Default: no HyTime traversal --

    endterms      -- Link end term information.--
                     Constraint: one per anchor or one for all.  --
                IDREFS      #IMPLIED  -- Default: none --

    aggtrav        -- Traversal of agglink anchors: agg or members.
                     Constraint: one per anchor or one for all.
                     lextype(("AGG"|"MEM"|"COR"), (s+,
                     ("AGG"|"MEM"|"COR"))*) --
                NAMES                 "MEM AGG MEM"
>

CApH Semantic Title

The optional CApH.semanticTitle element in the content of a CApH.semanticAssignment element is intended to contain a brief, single phrase text title for the semantic: one that is normally longer than the value of the mnemonic attribute. Generally, a semantic assignment has one semantic title. But there can be - interesting - cases where zero or several semantic titles can be useful:

A case where having no semantic title is useful is when a group of information objects has been gathered under the structure of a topic without explicitly giving the topic a title. This corresponds to the common situation of a cross-reference. When two information objects are linked through a cross-reference, there is no way of knowing what is the common semantics linking the two anchors of a link expressed through a relation such as "see also" or "see". It is therefore possible to consider that cross-references are similar to untitled topics. The interest of adopting a common description is that it encourages upgrades by providing a "hole" to be filled in for the semantic title of a topic. CApH-aware applications can make it easier to track these situations in documents and help organize them, by providing a mechanism to retrofit cross-references as topics.
A case where more than one semantic title may be required is when two equivalent index entries can refer one to the other. In this case the semantic titles are used to help users to chose one of a set of options. For example: "Art Museum" and "Museum of Art" can be alternate semantic titles for the same topic, offering to users two choices for accessing this topic in an alphabetical list. As nothing else than the semantic title differs here, this is typically a case where several semantic titles could be associated to the same topic. An alternate situation would be to create two topics with a "identity" topic relation: it would be left to users to decide whether they want to introduce a difference between these two situations.

<!element CApH.semanticTitle - - ANY >
              -- Descriptive phrase title or expansion of the mnemonic
                 of the containing CApH.semanticDefinition-form element --
<!attlist CApH.semanticTitle
    CApH        NAME        CApH.semanticTitle
>

Architectural Form for Topic Navigation

Hypertexts contain cross-reference links, and links of other kinds, that serve various purposes. Some links have explicit topic implications, and some do not. Some of those that have topic implications may nonetheless not be explicitly intended by their author as an indication of what should be provided to users as an aid to navigation within a topic space.

CApH-conforming links that aid topic navigation are recognized by the values of their CApH attributes, which must be either CApH.topicRelation or CApH.semanticAssignment.

Representation of Relationships between Topics: CApH.topicRelation

Topics may be linked to one another by means of topic relation links (CApH.topicRelations). These links express application-defined relationships, if any, between the topics. Any number of relationships can exist between any two or more topics. Each topic is specified in the anchaddr attribute by means of a unique identifier reference that ultimately resolves to the unique identifier of a CApH.semanticAssignment.

Topics may be linked to one another to create abstract topic maps that might be used as skeletal structures onto which exemplary and/or related instances can subsequently be added.

The exact nature of the relationship represented by a topic relation link element type may be given in a semantic definition element which is linked to all instances of links bearing this generic identifier by means of a semantic assignment link.

The content of a CApH.topicRelation is not specified by the CApH.

The value of the CApH.semanticUniverse attribute specifies the name(s) of the universe(s) in which the topic relationship expressed by the link is valid. A CApH engine must be able to warn the application whenever the CApH.semanticAssignment-form element is processed in a context in which none of the universes specified by the token list is valid, so that the topic relationship can be disqualified.

Universe(s) need not be specified, but they can be specified by defaulting them or fixing them in the DTD, or they can be specified (possibly overrriding a default value given in the DTD) in the start-tag of each element instance. If there is no default value and none is specified in the element instance, then the application's behavior with respect to disqualification is not specified by CApH.

<!element CApH.topicRelation - O ANY >
<!attlist CApH.topicRelation
    id          ID          #REQUIRED
    CApH        NAME        CApH.topicRelation
    semanticUniverses  CDATA  #IMPLIED            -- Default: not specified --
                  -- Constraints: Use #ALL to mean "valid in all
                     universes" --
    HyTime      NAME        ilink
    anchrole      -- Anchor roles.  Constraint: one per anchor. --
                  -- CApH lextype("Relation", s+, (NAME, (s+, RNI, "AGG")?),
                     (s+, NAME, s+, (RNI, "AGG")?)+). --
                  -- CApH constraint: As NAMEs, use value(s) of the
                     mnemonic attribute(s) of CApH or user
                     instantiated CApH.semanticDefinition element(s) --
                CDATA       #FIXED in-DTD
    anchaddr      -- Anchor addresses --
                  -- Constraint: one anchor per anchor role. If one is omitted,
                     ilink element is first anchor.--
                  -- CApH constraint: IDREFS must resolve to elements
                     conforming to CApH.semanticAssignment,
                     CApH.semanticDefinition, or CApH.topicRelation
                     architectural forms --
                IDREFS       #REQUIRED
    extra         -- External access traversal rule --
                  -- Constraint: one per anchor or one for all --
                  -- Lextype(("E"|"I"|"A"|"N"|"P")+) --
                NAMES    #IMPLIED  -- Default: no HyTime traversal --
    intra         -- Internal access traversal rule --
                  -- Constraint: one per anchor or one for all --
                  -- Lextype(("E"|"I"|"A"|"N"|"P")+) --
                NAMES    #IMPLIED  -- Default: no HyTime traversal --
    endterms      -- Link end term information.--
                  -- Constraint: one per anchor or one for all.--
                IDREFS      #IMPLIED  -- Default: none --
    aggtrav        -- Traversal of agglink anchors: agg or members.
                     Constraint: one per anchor or one for all.
                     lextype(("AGG"|"MEM"|"COR"), (s+,
                     ("AGG"|"MEM"|"COR"))*) --
                NAMES                 agg
>