XTM Uses Scope For Languages
Date: Sat, 02 Feb 2002 14:30:22 -0600 From: "Steven R. Newcomb" <srn@coolheads.com> To: "Bandholtz, Thomas" <thomas.bandholtz@koeln.sema.slb.com> Cc: topicmaps-comments <topicmaps-comment@lists.oasis-open.org>, topicmapmail@infoloom.com Subject: multilingual thesaurus - language, scope, and topic naming constraint
"Bandholtz, Thomas" <thomas.bandholtz@koeln.sema.slb.com> writes: > I never understood why XTM uses scope for languages. > ISO/IEC FCD 13250 does not recommend that. > XML itself has defined the xml:language attribute, > i.e. see: << [Definition:] language represents > natural language identifiers as defined by [RFC > 1766]. The 7value space7 of language is the set of > all strings that are valid language identifiers as > defined in the language identification section of > [XML 1.0 (Second Edition)]. The 7lexical space7 of > language is the set of all strings that are valid > language identifiers as defined in the language > identification section of [XML 1.0 (Second > Edition)]. The 7base type7 of language is token. > http://www.w3.org/TR/xmlschema-2/#language > The scope of GEMET is Environmental Protection, and > not a bunch of languages. > So you might use: > > <basename xml:lang="en">economics</basename> > etc. > I know this does not conform with XTM - but sorry to > say using scope for language does not conform with > XML :-(. Bernard's sample shows the problems very > well. They would disappear when we use xml:language > instead of scope. This would also modify the merging > rule: ... having the same basename in the same scope > ... and the same language! (allthough even this does > not really work because of homonyms) I think several points need to be made here. (1) Not all applications of Topic Maps require languages to be identified. (For that matter, not all applications of XML require languages to be identified.) In order to appear in a topic map, each natural language must itself be the subject of a topic. The topics that are implicit in the XTM syntax itself were strictly limited to those that were considered necessary for *all* (or virtually all) Topic Maps applications. The XTM syntax has ways of making any subject, including subjects that are natural languages, the subject of a topic. (2) There are lots of "standard" namespaces in use with XML, and more are added from time to time. Each of them is implicitly founded on some set of specialized subjects. The XTM syntax should not (and, in fact, cannot) be expanded to provide special syntactic features for each and every subject. Consider the ODA experience, and the 28001 experience, both of which are cautionary tales. One of the lessons that all XML people should bear in their minds is that one syntax never fits all, at least partly because one set of semantics can never be enough. At the same time, one set of semantics can easily be too many to be useful. Syntaxes that never stop growing are like cancers: they get larger and larger, consuming more and more resources, until they finally either kill their hosts, or they are excised. (3) XTM was designed to be maximally intuitive. We deliberately avoided, wherever we could, situations in which "hidden magic" processing -- processing that would require special treatment of special syntactic cases -- would have to occur. Your proposal that the value of the xml:lang attribute be treated as a member of the operative scope is exactly the kind of thing we labored to avoid. We believed that, if a topic is supposed to be considered a member of a scope, then, by Golly, it should appear inside the corresponding <scope> element. Otherwise, the syntax becomes unlearnable, because it is too tricky. (4) Nothing prevents a topic from having a subject indicator that references the relevant specification for xml:lang. In fact, that's a good idea! Such a topic can then appear in a scope, like any other topic. Users can get all the standardizing benefits of xml:lang, but without establishing a precedent in which XTM cannot say "no" to any new "xml:" attribute that comes along, for any new class of subjects (why stop at languages?), thus making XTM become yet another bloated necrotic tumor on the SGML/XML landscape, and the subject of yet another cautionary tale. (5) If you want to design your own syntax for XML Topic Maps, you are free to do so, and you can bring into it every kind of topic that is implicit in every kind of XML namespace name, if that's what you want to do. Eventually (but not just yet), you'll be able to define a processing model for your syntax that will enable your topic maps to be merged with all others, just as if they were all notated in a single standard syntax. (Whether anyone will pay attention to your syntax, and its processing model, is a separate issue.) This syntax-unification capability is one of the goals of the modeling work that is being done in ISO SC34/WG3. The initial impetus for this work was that there are already two standard syntaxes for Topic Maps: HyTM and XTM. > The scope of GEMET is Environmental Protection, and > not a bunch of languages. (6) In a topic map, either a subject is represented as a topic, or it is not represented at all. Note: In the foregoing sentence, I did *not* say: In a topic map, either a subject is represented as a <topic> element, or it is not represented at all. In an XTM-conforming syntactic instance topic map, there are many topics that are not represented as <topic> elements. A base name, for example, is necessarily, at the lowest level, a subject, even though there is no corresponding <topic> element whose subject is that base name. If you want to understand either XTM or HyTM, it's imperative to understand what are all the implicit subjects that must, ultimately and fundamentally, be handled as topics. If, for example, a subject appears in a scope, that subject, like all other subjects, is represented as a topic. There is nothing else that it can be, at least not in the Topic Map paradigm. Some eminent Topic Maps practitioners have suggested that, after topic map processing/merging, there should still be a distinction between topics that were the subjects of <topic> elements, and all other topics. Maybe this requirement is one that you would like to voice your support for, so that, for example, in your GEMET topic map, you can automatically distinguish between the topics that you chose to specify via <topic> elements, and all others. Such a distinction could be used to control the way a rendering application behaves, so that you can specify <topic> elements only for the subjects that are directly relevant to GEMET, and have only those topics be rendered in certain circumstances. The same goal could be accomplished more flexibly by asserting that all your GEMET topics are instances of the GEMET class. This technique would leave you in a position to use <topic> elements for auxiliary subjects, without polluting your specially-privileged set of GEMET topics. What do you think? Should it be a part of XTM processing that every topic that corresponds to a <topic> element is automatically an instance of the "came from an XTM <topic> element" class? Or is this extra overhead going to benefit only a small group of practitioners, who could have achieved the same goal more flexibly by explicitly asserting which subjects should be treated as "full-fledged" topics (whatever they might mean by that)? -- Steve Steven R. Newcomb, Consultant srn@coolheads.com voice: +1 972 359 8160 fax: +1 972 359 0270 1527 Northaven Drive Allen, Texas 75002-1648 USA
Prepared by Robin Cover for The XML Cover Pages archive. See related references in (1) the news item, "OASIS Technical Committee to Define Published Subjects for Geography and Languages"; (2) "(XML) Topic Maps."