XTM Uses Scope For Languages

Date: Sat, 02 Feb 2002 14:30:22 -0600
From: "Steven R. Newcomb" <srn@coolheads.com>
To: "Bandholtz, Thomas" <thomas.bandholtz@koeln.sema.slb.com>
Cc: topicmaps-comments <topicmaps-comment@lists.oasis-open.org>, topicmapmail@infoloom.com
Subject: multilingual thesaurus - language, scope, and topic naming constraint

"Bandholtz, Thomas" <thomas.bandholtz@koeln.sema.slb.com> writes:

> I never understood why XTM uses scope for languages.
> ISO/IEC FCD 13250 does not recommend that.

> XML itself has defined the xml:language attribute,
> i.e. see: << [Definition:] language represents
> natural language identifiers as defined by [RFC
> 1766]. The 7value space7 of language is the set of
> all strings that are valid language identifiers as
> defined in the language identification section of
> [XML 1.0 (Second Edition)]. The 7lexical space7 of
> language is the set of all strings that are valid
> language identifiers as defined in the language
> identification section of [XML 1.0 (Second
> Edition)]. The 7base type7 of language is token.
> http://www.w3.org/TR/xmlschema-2/#language

> The scope of GEMET is Environmental Protection, and
> not a bunch of languages.

> So you might use:
> 
> <basename xml:lang="en">economics</basename>
> etc.

> I know this does not conform with XTM - but sorry to
> say using scope for language does not conform with
> XML :-(.  Bernard's sample shows the problems very
> well. They would disappear when we use xml:language
> instead of scope.  This would also modify the merging
> rule: ... having the same basename in the same scope
> ... and the same language!  (allthough even this does
> not really work because of homonyms)

I think several points need to be made here.

(1) Not all applications of Topic Maps require
    languages to be identified.  (For that matter, not
    all applications of XML require languages to be
    identified.)  In order to appear in a topic map,
    each natural language must itself be the subject of
    a topic.  The topics that are implicit in the XTM
    syntax itself were strictly limited to those that
    were considered necessary for *all* (or virtually
    all) Topic Maps applications.  The XTM syntax has
    ways of making any subject, including subjects that
    are natural languages, the subject of a topic.

(2) There are lots of "standard" namespaces in use with
    XML, and more are added from time to time.  Each of
    them is implicitly founded on some set of
    specialized subjects.  The XTM syntax should not
    (and, in fact, cannot) be expanded to provide
    special syntactic features for each and every
    subject.  Consider the ODA experience, and the
    28001 experience, both of which are cautionary
    tales.  One of the lessons that all XML people
    should bear in their minds is that one syntax never
    fits all, at least partly because one set of
    semantics can never be enough.  At the same time,
    one set of semantics can easily be too many to be
    useful.  Syntaxes that never stop growing are like
    cancers: they get larger and larger, consuming more
    and more resources, until they finally either kill
    their hosts, or they are excised.

(3) XTM was designed to be maximally intuitive.  We
    deliberately avoided, wherever we could, situations
    in which "hidden magic" processing -- processing
    that would require special treatment of special
    syntactic cases -- would have to occur.  Your
    proposal that the value of the xml:lang attribute
    be treated as a member of the operative scope is
    exactly the kind of thing we labored to avoid.  We
    believed that, if a topic is supposed to be
    considered a member of a scope, then, by Golly, it
    should appear inside the corresponding <scope>
    element.  Otherwise, the syntax becomes
    unlearnable, because it is too tricky.

(4) Nothing prevents a topic from having a subject
    indicator that references the relevant
    specification for xml:lang.  In fact, that's a good
    idea!  Such a topic can then appear in a scope,
    like any other topic.  Users can get all the
    standardizing benefits of xml:lang, but without
    establishing a precedent in which XTM cannot say
    "no" to any new "xml:" attribute that comes along,
    for any new class of subjects (why stop at
    languages?), thus making XTM become yet another
    bloated necrotic tumor on the SGML/XML landscape,
    and the subject of yet another cautionary tale.

(5) If you want to design your own syntax for XML Topic
    Maps, you are free to do so, and you can bring into
    it every kind of topic that is implicit in every
    kind of XML namespace name, if that's what you want
    to do.  Eventually (but not just yet), you'll be
    able to define a processing model for your syntax
    that will enable your topic maps to be merged with
    all others, just as if they were all notated in a
    single standard syntax.  (Whether anyone will pay
    attention to your syntax, and its processing model,
    is a separate issue.)  This syntax-unification
    capability is one of the goals of the modeling work
    that is being done in ISO SC34/WG3.  The initial
    impetus for this work was that there are already
    two standard syntaxes for Topic Maps: HyTM and XTM.

> The scope of GEMET is Environmental Protection, and
> not a bunch of languages.

(6) In a topic map, either a subject is represented as
    a topic, or it is not represented at all.  

    Note: In the foregoing sentence, I did *not* say:

            In a topic map, either a subject is
            represented as a <topic> element, or it is
            not represented at all.

          In an XTM-conforming syntactic instance topic
          map, there are many topics that are not
          represented as <topic> elements.  A base
          name, for example, is necessarily, at the
          lowest level, a subject, even though there is
          no corresponding <topic> element whose
          subject is that base name.

          If you want to understand either XTM or HyTM,
          it's imperative to understand what are all
          the implicit subjects that must, ultimately
          and fundamentally, be handled as topics.  If,
          for example, a subject appears in a scope,
          that subject, like all other subjects, is
          represented as a topic.  There is nothing
          else that it can be, at least not in the
          Topic Map paradigm.


    Some eminent Topic Maps practitioners have
    suggested that, after topic map processing/merging,
    there should still be a distinction between topics
    that were the subjects of <topic> elements, and all
    other topics.  Maybe this requirement is one that
    you would like to voice your support for, so that,
    for example, in your GEMET topic map, you can
    automatically distinguish between the topics that
    you chose to specify via <topic> elements, and all
    others.  Such a distinction could be used to
    control the way a rendering application behaves, so
    that you can specify <topic> elements only for the
    subjects that are directly relevant to GEMET, and
    have only those topics be rendered in certain
    circumstances.

    The same goal could be accomplished more flexibly
    by asserting that all your GEMET topics are
    instances of the GEMET class.  This technique would
    leave you in a position to use <topic> elements for
    auxiliary subjects, without polluting your
    specially-privileged set of GEMET topics.

    What do you think?  Should it be a part of XTM
    processing that every topic that corresponds to a
    <topic> element is automatically an instance of the
    "came from an XTM <topic> element" class?  Or is
    this extra overhead going to benefit only a small
    group of practitioners, who could have achieved the
    same goal more flexibly by explicitly asserting
    which subjects should be treated as "full-fledged"
    topics (whatever they might mean by that)?

-- Steve

Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

1527 Northaven Drive
Allen, Texas 75002-1648 USA

Prepared by Robin Cover for The XML Cover Pages archive. See related references in (1) the news item, "OASIS Technical Committee to Define Published Subjects for Geography and Languages"; (2) "(XML) Topic Maps."