XTM Uses Scope For Languages
Date: Sat, 02 Feb 2002 14:30:22 -0600 From: "Steven R. Newcomb" <srn@coolheads.com> To: "Bandholtz, Thomas" <thomas.bandholtz@koeln.sema.slb.com> Cc: topicmaps-comments <topicmaps-comment@lists.oasis-open.org>, topicmapmail@infoloom.com Subject: multilingual thesaurus - language, scope, and topic naming constraint
"Bandholtz, Thomas" <thomas.bandholtz@koeln.sema.slb.com> writes:
> I never understood why XTM uses scope for languages.
> ISO/IEC FCD 13250 does not recommend that.
> XML itself has defined the xml:language attribute,
> i.e. see: << [Definition:] language represents
> natural language identifiers as defined by [RFC
> 1766]. The 7value space7 of language is the set of
> all strings that are valid language identifiers as
> defined in the language identification section of
> [XML 1.0 (Second Edition)]. The 7lexical space7 of
> language is the set of all strings that are valid
> language identifiers as defined in the language
> identification section of [XML 1.0 (Second
> Edition)]. The 7base type7 of language is token.
> http://www.w3.org/TR/xmlschema-2/#language
> The scope of GEMET is Environmental Protection, and
> not a bunch of languages.
> So you might use:
>
> <basename xml:lang="en">economics</basename>
> etc.
> I know this does not conform with XTM - but sorry to
> say using scope for language does not conform with
> XML :-(. Bernard's sample shows the problems very
> well. They would disappear when we use xml:language
> instead of scope. This would also modify the merging
> rule: ... having the same basename in the same scope
> ... and the same language! (allthough even this does
> not really work because of homonyms)
I think several points need to be made here.
(1) Not all applications of Topic Maps require
languages to be identified. (For that matter, not
all applications of XML require languages to be
identified.) In order to appear in a topic map,
each natural language must itself be the subject of
a topic. The topics that are implicit in the XTM
syntax itself were strictly limited to those that
were considered necessary for *all* (or virtually
all) Topic Maps applications. The XTM syntax has
ways of making any subject, including subjects that
are natural languages, the subject of a topic.
(2) There are lots of "standard" namespaces in use with
XML, and more are added from time to time. Each of
them is implicitly founded on some set of
specialized subjects. The XTM syntax should not
(and, in fact, cannot) be expanded to provide
special syntactic features for each and every
subject. Consider the ODA experience, and the
28001 experience, both of which are cautionary
tales. One of the lessons that all XML people
should bear in their minds is that one syntax never
fits all, at least partly because one set of
semantics can never be enough. At the same time,
one set of semantics can easily be too many to be
useful. Syntaxes that never stop growing are like
cancers: they get larger and larger, consuming more
and more resources, until they finally either kill
their hosts, or they are excised.
(3) XTM was designed to be maximally intuitive. We
deliberately avoided, wherever we could, situations
in which "hidden magic" processing -- processing
that would require special treatment of special
syntactic cases -- would have to occur. Your
proposal that the value of the xml:lang attribute
be treated as a member of the operative scope is
exactly the kind of thing we labored to avoid. We
believed that, if a topic is supposed to be
considered a member of a scope, then, by Golly, it
should appear inside the corresponding <scope>
element. Otherwise, the syntax becomes
unlearnable, because it is too tricky.
(4) Nothing prevents a topic from having a subject
indicator that references the relevant
specification for xml:lang. In fact, that's a good
idea! Such a topic can then appear in a scope,
like any other topic. Users can get all the
standardizing benefits of xml:lang, but without
establishing a precedent in which XTM cannot say
"no" to any new "xml:" attribute that comes along,
for any new class of subjects (why stop at
languages?), thus making XTM become yet another
bloated necrotic tumor on the SGML/XML landscape,
and the subject of yet another cautionary tale.
(5) If you want to design your own syntax for XML Topic
Maps, you are free to do so, and you can bring into
it every kind of topic that is implicit in every
kind of XML namespace name, if that's what you want
to do. Eventually (but not just yet), you'll be
able to define a processing model for your syntax
that will enable your topic maps to be merged with
all others, just as if they were all notated in a
single standard syntax. (Whether anyone will pay
attention to your syntax, and its processing model,
is a separate issue.) This syntax-unification
capability is one of the goals of the modeling work
that is being done in ISO SC34/WG3. The initial
impetus for this work was that there are already
two standard syntaxes for Topic Maps: HyTM and XTM.
> The scope of GEMET is Environmental Protection, and
> not a bunch of languages.
(6) In a topic map, either a subject is represented as
a topic, or it is not represented at all.
Note: In the foregoing sentence, I did *not* say:
In a topic map, either a subject is
represented as a <topic> element, or it is
not represented at all.
In an XTM-conforming syntactic instance topic
map, there are many topics that are not
represented as <topic> elements. A base
name, for example, is necessarily, at the
lowest level, a subject, even though there is
no corresponding <topic> element whose
subject is that base name.
If you want to understand either XTM or HyTM,
it's imperative to understand what are all
the implicit subjects that must, ultimately
and fundamentally, be handled as topics. If,
for example, a subject appears in a scope,
that subject, like all other subjects, is
represented as a topic. There is nothing
else that it can be, at least not in the
Topic Map paradigm.
Some eminent Topic Maps practitioners have
suggested that, after topic map processing/merging,
there should still be a distinction between topics
that were the subjects of <topic> elements, and all
other topics. Maybe this requirement is one that
you would like to voice your support for, so that,
for example, in your GEMET topic map, you can
automatically distinguish between the topics that
you chose to specify via <topic> elements, and all
others. Such a distinction could be used to
control the way a rendering application behaves, so
that you can specify <topic> elements only for the
subjects that are directly relevant to GEMET, and
have only those topics be rendered in certain
circumstances.
The same goal could be accomplished more flexibly
by asserting that all your GEMET topics are
instances of the GEMET class. This technique would
leave you in a position to use <topic> elements for
auxiliary subjects, without polluting your
specially-privileged set of GEMET topics.
What do you think? Should it be a part of XTM
processing that every topic that corresponds to a
<topic> element is automatically an instance of the
"came from an XTM <topic> element" class? Or is
this extra overhead going to benefit only a small
group of practitioners, who could have achieved the
same goal more flexibly by explicitly asserting
which subjects should be treated as "full-fledged"
topics (whatever they might mean by that)?
-- Steve
Steven R. Newcomb, Consultant
srn@coolheads.com
voice: +1 972 359 8160
fax: +1 972 359 0270
1527 Northaven Drive
Allen, Texas 75002-1648 USA
Prepared by Robin Cover for The XML Cover Pages archive. See related references in (1) the news item, "OASIS Technical Committee to Define Published Subjects for Geography and Languages"; (2) "(XML) Topic Maps."

