Cover Pages Logo SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic

Canonical XTM - a proposal


Date: 20 Feb 2001 10:25:01 +0100
From: Lars Marius Garshol <larsga@garshol.priv.no>
Reply-To: xtm-wg@yahoogroups.com
To: xtm-wg@yahoogroups.com
Subject: [xtm-wg] Canonical XTM - a proposal


I've now written up a proposal for a Canonical XTM specification,
which is appended here. It is submitted for the consideration of
topicmaps.org, in the hope that it may be useful. It has already been
implemented and is now used internally by Ontopia for testing
purposes.

--Lars M.

  CANONICAL XTM
  A canonical serialization format for XML topic maps
  Version 0.1

  Lars Marius Garshol <larsga@ontopia.net>, with contributions by 
  Geir Ove Grønmo <grove@ontopia.net>.

  (Please note: this text is just a contribution to the XTM process
  and has _no_ official standing.)

  $Id: cxtm-proposal.txt,v 1.1 2001/02/20 09:24:41 larsga Exp $

  PRELIMINARIES
===============

This specification describes a serialization format for XML topic maps
which has the property that all logically equivalent topic maps have
the exact same byte-by-byte representation in this format. This can be
used to test the conformance of XTM processors.

The specification describes the serialization of a topic map into an
output document, but does not concern itself with where that topic map
came from. It is NOT a goal to ensure that the canonical topic map can
be successfully read into an XTM processor, but merely to confirm that
all processing defined by the XTM 1.0 specification has been performed
correctly. 

The topic map must before serialization be processed into consistent
topic map, as defined by XTM 1.0. When applying canonicalization to
XTM documents no string normalization such as Unicode canonical
decomposition must be performed.

The output document must be a canonical XML document, as specified in
<URL: http://www.w3.org/TR/xml-c14n >.  In addition, a line feed
(U+00A0) must be inserted after every end tag and likewise after every
start tag of elements that have element content or are empty. (This
means baseNameString, variantName, resourceData, topicRef, instanceOf,
resourceRef, subjectIndicatorRef.)

FIXME: URI normalization
FIXME: sorting of topics that have no characteristics
FIXME: class-instance topic relationships with scope


  SERIALIZATION
===============

The document element must be a <topicMap> element with these attribute
value assignments: 

  xmlns         http://www.topicmaps.org/cxtm/1.0/

The topic map is serialized by first writing out all topics, and then
writing out all associations. Since only one topic map is output,
there is no mergemap information to serialize.


  <topic>
---------

Topics are sorted by their sort keys (see the Ordering principles
section) and then serialized in that order. All <topic> elements must
have an id attribute, set to the value 'idN', where N is the number of
the topic in sort order, starting with 1.

Topics are serialized by first writing out all class-instance
relationships as <instanceOf> elements, then the <subjectIdentity>
element, then all <baseName>s, then all <occurrence>s.  The
<instanceOf>, <baseName> and <occurrence> elements are ordered
according to the rules in the 'Ordering principles' section.


  <instanceOf>
--------------

A class-instance relationship is serialized as an <instanceOf>
element, with the 'href' attribute set to the ID of the <topic>
element representing the class topic, with the character '#'
prepended. 

Note that the <instanceOf> element is an empty element, and so,
according to the Canonical XML specification must be serialized with
both a start and an end tag, with nothing between the tags.


  <subjectIdentity>
-------------------

If the topic has no addressable subject, nor any known subject
indicators, this element is not output at all.

If the topic has an addressable subject, that is output first using a
<resourceRef> element.

For each subject indicator the topic has, a <subjectIndicatorRef>
element is output. The elements must be ordered according to the
ordering principles.


  <resourceRef>
---------------

The <resourceRef> element is an empty element, holding the reference
to the resource in its 'href' attribute. FIXME: uri norm!


  <subjectIndicatorRef>
-----------------------

The <subjectIndicatorRef> element is an empty element, holding the
reference to the subject indicator in its 'href' attribute. FIXME: uri
norm!


  <baseName>
------------

Each topic name is serialized using a <baseName> element. First the
scope is written out using the <scope> element, then the base name
value
in the <baseNameString> element and finally the variant names using
<variant> elements. The variant names must be ordered according to the
ordering principles.


  <scope>
---------

If the scoped topic map construct has an empty scope, this element is
not output at all.  If it has a non-empty scope, references to the
topics making up that scope are written out using <topicRef> elements
in the order defined by the ordering principles.

Note that in all cases the scope that is output must consist of the
scope resulting from inheriting the scope of any parent elements that
have scope. The scope of variant names therefore consists of the union
of their own scope and those scope of all their ancestors.


  <baseNameString>
------------------

Contains the base name value.


  <variant>
-----------

Each variant name is serialized using a <variant> element. First its
parameters are written out using the <scope> element, then the variant
name value in the <variantName> element and finally any child variant
names using <variant> elements. The variant names must be ordered
according to the ordering principles.


  <variantName>
---------------

Contains the variant name value.


  <occurrence>
--------------

Each occurrence is written out using an <occurrence> element. If the
occurrence is an instance of a class an <instanceOf> element is
output, followed by a <scope> element representing the scope of the
occurrence (provided it is non-empty) and last followed by a
<resourceRef> element if the occurrence is an external resource or a
<resourceData> element if the occurrence is an internal resource.
FIXME: this is probably too vague


  <resourceData>
----------------

Contains the resource inline.


  <association>
---------------

Associations are serialized using <association> elements, which first
contain an <instanceOf> element (if the association is an instance of
a class), a <scope> element (unless the association is in the
unconstrained scope), and finally a <member> element for each
participating topic in the association. The <member> elements must be
ordered according to the ordering principles.


  ORDERING PRINCIPLES
=====================

This section establishes how to determine the sort key value of each
topic map element that is written out. This is used to ensure that all
elements are serialized in a specific order. That order is obtained by
sorting the elements according to their sort keys in lexicographical
order, based on UCS code point values.


  Topics
--------

If the topic has an addressable subject, the URI of that resource is
the sort key.

Failing that, if the topic has a subject indicator, the URI of the
first subject indicator (as ordered according to these rules) is the
sort key.

Failing that, if the topic has base names, the sort key of the first
base name (as ordered according to these rules) is the sort key.

Failing that, if the topic has occurrences, the URI of the first
occurrence (as ordered according to these rules) is the sort key.


  <instanceOf>, <topicRef>, <member>
------------------------------------

The sort key is the ID of the topic element referred to.


  Topic names, base names
-------------------------

The sort key is constructed by appending the following into a string:
the base name value, followed by a '|' character, followed by the
assigned IDs of all topics in the scope of the topic name separated by
spaces and ordered according to these principles.


  Occurrences, subject indicators
---------------------------------

The sort key is the URI of the resource. FIXME: resourceData!


  Variant names
---------------

The sort key is constructed by appending the following into a string:
the variant name value, followed by a '|' character, followed by the
assigned IDs of all topics in the scope of the variant name separated
by spaces and ordered according to these principles.


  Associations
--------------

The sort key is the sort keys of all its members in sort order,
separated by '|' characters. If the association is an instance of a
class, a '$' character is appended, followed by the assigned ID of the
topic representing that class.


  Association members
---------------------

The sort key is the ID of the topic element referred to by its
<topicRef> child if the member has no specified role. If it does, a
space and the assigned ID of the topic defining the role are appended.


  DTD
=====

<!ELEMENT topicMap (topic*, association*)>
<!ATTLIST topicMap 
          xmlns       CDATA "http://www.topicmaps.org/cxtm/1.0/"
#FIXED>

<!ELEMENT topic (instanceOf*, subjectIdentity?, baseName*,
occurrence*)>
<!ATTLIST topic
          id          ID    #REQUIRED>
          
<!ELEMENT instanceOf EMPTY>
<!ATTLIST instanceOf 
          href        CDATA #REQUIRED>

<!ELEMENT subjectIdentity (resourceRef?, subjectIndicatorRef*)>

<!ELEMENT resourceRef EMPTY>
<!ATTLIST resourceRef
          href        CDATA #REQUIRED>

<!ELEMENT subjectIndicatorRef EMPTY>
<!ATTLIST subjectIndicatorRef
          href        CDATA #REQUIRED>


<!ELEMENT baseName (scope?, baseNameString, variant*)>

<!ELEMENT scope (topicRef+)>
<!ELEMENT topicRef EMPTY>
<!ATTLIST topicRef
          href        CDATA #REQUIRED>

<!ELEMENT baseNameString (#PCDATA)>


<!ELEMENT variant (scope, variantName, variant*)>
<!ELEMENT variantName (#PCDATA)>


<!ELEMENT occurrence (instanceOf?, scope?, (resourceRef |
resourceData)>


<!ELEMENT association (instanceOf?, scope?, member+)>
<!ELEMENT member (instanceOf?, topicRef)>

Prepared by Robin Cover for The XML Cover Pages archive. See: "(XML) Topic Maps."


Globe Image

Document URL: http://xml.coverpages.org/xtmCanonical20010220.html