Why We Need Namespaces (Modules)

An SGML/XML Feature Proposal

Author: Paul Prescod

Abstract

The World Wide Web Consortium has recently published a note called Namespaces in XML. Not everyone has access to it yet, but they will soon. It proposes a simple convention for allowing instances to have elements whose type names come from many different schemas. According to that note:

We envision applications of XML in which a document instance may contain markup defined in multiple schemas. These schemas may have been authored independently. One motivation for this is that writing good schemas is hard, so it is beneficial to reuse parts from existing, well-designed schemas. Another is the advantage of allowing search engines or other tools to operate over a range of documents that vary in many respects but use common names for common element types.
Advocates of ISO architectural forms ("archforms") have noticed that these requirements are very similar to those for archforms and have proposed archforms as a solution. They are correct that the basic underlying problems are related, but the problems are not identical. We need both archforms and namespaces. The two ideas are actually very complementary. This note demonstrates why neither architectural forms nor the current namespace proposal really solve the "namespace problem" satisfactorily.

Background

I will use the document A Proposal to Introduce "Module" Structures Into SGML as an example of a modules proposal which includes not just a convention for namespace combination, but a syntax for actually combining SGML DTD fragments. These fragments are the only standardized schema for either SGML or XML.

Architectural forms allow a "client document" to declare that certain elements conform to an element type in a DTD other than the document's DTD. For instance you could say that a particular element is both a LINK element in the document's DTD and a HyTime CLINK element in the HyTime architecture. It is essentially both things at once. You can either declare a particular element as having an architectural element type (in addition to its ordinary element type) or you can declare that all of the elements of a particular type adhere to a particular architectural element type. For instance you could say that a particular "human" element conforms to the "animal" architectural element type (if the human was, for example, a "party animal") or you could say that all "dog" elements conform to the "animal" architectural element type.

The Rub

A particular element can also conform to multiple architectural element types. For instance the afore mentioned human could conform to both the "programmer" and the "party animal" architectural element types (no, those are not logically exclusive). My claim is that this increased generality is a powerful feature in many contexts, but makes things way too complex in the simple case for architectural forms to be the most basic namespace management facility in XML. SGML and SGML tools are organized around the idea that each element conforms to one and only one element type. We have not yet re-thought the SGML processing idea in terms of the concept of multiple element types.

For instance, the most common form of SGML processing is validation. SGML uses DTDs to define constraints on SGML documents. According to the Japanese proposal, validation could be accomplished less like this:

<!DOCTYPE MATH.AND.HYPERLINKS [
<!MODULE MATH SYSTEM "math.module.dtd">
<!MODULE HY SYSTEM "hyperlinks.module.dtd">
<!ELEMENT MATH.AND.HYPERLINKS (#PCDATA|HY::LINK|MATH::FORMULA)>

Imagine that math.module.dtd and hyperlinks.module.dtd are hundreds of lines long. Imagine also that they both had an element called "SET" (for "mathematical set" and "link set"). As far as I know, there is no way to accomplish this namespace merging operation with anything close to the same ease with architectural forms. Yes, I can do it, by copying math.module.dtd and hyperlinks.module.dtd into my document type. I can then manually fix up the namespace clashes like my "SET" element. But it is this sort of duplication of code that the modules proposal was explicity designed to avoid. In fact, that is it's reason for existing. We can see, then, that architectural forms do not solve the problem that the modules proposal was meant to solve. They do not automatically merge namespaces.

Let me define some terms to clarify. A namespace is a mapping from names to objects, such as element type names to element types (explicitly or implicitly declared). A namespace merge is the construction of a namespace from two others that preserve all of the elements from the originals. Architectural forms provide access to multiple namespaces, but they do not merge namespaces.

I suspect that some with a long background in SGML will be a little baffled trying to understand why someone would want to do this. After all, combining document types is typically difficult work performed by experts, tested on teams of users, tweaked to perfection with element names remapped to fit the terminology of the user community. Mixing and matching DTD fragments in an ad hoc manner might not seem like a good idea. But the fact is that we live in a brave new world. End users want to take control of their own document types in many cases. They want to mix and match DTD fragments and they are not willing to spend the amount of effort that we professionals are. Good for them! They will make all of our lives easier. In fact, when authors say that they want to "get rid of" DTDs, what they typically mean is that they don't want to be constrained by someone else's DTD and making their own is too difficult! If we can make DTD maintenance easier, more people will use them.

Perhaps it would be possible update SGML that validation does not depend so deeply on each element having a single element type, so that content models could be expressed that combined elements from different architectures. If we did that, my complaint might go away. Architectures might regain some of the validatory simplicity of the modules proposal. But this would require a much more fundamental change to SGML than the modules proposal would.

Stylesheets

I will use stylesheets as another example of processing. The three most interesting stylesheet languages right now are DSSSL, XSL and CSS. Each of those has as its central organizing construct a rule triggered on an element type name in a context. DSSSL has a feature that would allow querying on architecture, but the feature is optional and is not supported, for instance, by James Clark's Jade. Even where the feature is available, the architectural form-based version of a stylesheet is much more complicated than the equivalent based on a "flat" namespace (such as a stylesheet for tradition SGML or SGML augmented with the modules proposal). I invite architectural forms advocates to prove me wrong by providing their stylesheets.

Here is what a module-enhanced DSSSL might look like:

<module target="mathml.dsl">
<module target="hyperlinks.dsl">
(element MATH.AND.HYPERLINKS (process-children))

As you can see, this has just enough lines to include the relevant stylesheet modules and provide rules for the new elements. What would the equivalent archform code look like? With DSSSL as it exists, it would look quite ugly and convoluted. With some enhanced DSSSL it might look reasonable (just as some enhanced SGML might be able to have content models that span architectures), but nobody has yet proposed what such a DSSSL would look like (just as nobody has proposed the enhanced SGML). I am open to suggestions...

I do not believe that either the current XSL proposal or CSS would allow architecture based processing at all. Once again, the idea that every element has a single element type is a fundamental organizing principle of these stylesheet languages. It is also an organizing principle of most SGML editors, DTD editors and formatting and conversion tools I have used. In fact, almost every SGML tool in the world operates under that principle. The best tools will give you access to architectural forms (through their architectural attributes), but they will typically use the element type name as the major organizing feature of the stylesheets. Archform centric processing is typically awkward if it is possible at all.

The one element, one elment type principle is also central to every course in SGML I have ever taken and any book on it I have ever read. Even the SGML Handbook says that every element has a particular element type (a single, particular element type).

The Argument From Usability

Imagine that you are a typical end user and have used archforms instead of a namespace merging mechanism to combine DTD fragments. Now imagine that you know that a particular element type name appears in both DTD fragments. I think that most people would be very surprised to learn that the way to associate this element with one or the other DTD is to add an attribute. Because the generic identifier (the name in the start-tag) usually establishes the element type, you would probably expect to change the generic identifier to change the association. But using architectural forms, you would actually rather have to add an attribute that would essentially disassociate the element with one of the element types: "I may have the same name as that element type, but it isn't actually one of my element types." I think that this is a nasty case of making the common, simple case of merging DTD fragments more complicated in order to make life easier for those of us who have to solve problems that may actually require the full generality of architectural forms. Once again, I invite advocates to send me code samples that demonstrate that this is simpler than I think.

Who was it that said: "Make the easy things easy and the hard things possible." Architectural forms make hard things possible, but when misapplied to the namespace problem, they make easy things unnecessarily hard. Le me be clear: architectural forms (or something like them) have an important role to play in SGML systems. We absolutely need some form of semantic inheritance mechanism. But they work best when they work in the environment they were designed for: they are typically used as an underlying basis of a DTD designed by a professional. The professional DTD designer renames elements to avoid clashes. That individual is the real solution to the "namespace problem" in most environments. In environments where such a person exists, archforms are really, really useful. They are not useful because they allow you to merge namespaces (they don't). They are useful because they allow you to combine semantics from different DTD fragments in powerful ways (but more or less manually). I think that a modules/namespaces proposal would acutally be very useful for building architectures from DTD fragments. I also think that architectural forms would be very useful on the Web. Not every use of XML on the web will be ad hoc. Some XML applications will need the robust multi-level validation that architectural forms allow. Think about e-commerce for example.

But many users will not need or want architectural forms. Most people just need a simple way to combine fixed DTD fragments so that there are no name clashes. The Japanese module proposal provides such a mechanism. Presumably Web-centric DTD-replacement schema languages will provide mechanisms like this also. If these sorts of things are made much easier in these schema languages than they are in SGML DTD syntax, people will just avoid SGML DTD syntax. This would be a big mistake for all concerned. Let's please just fix SGML through a proposal like the one submitted by the Japanese in 1996. Some modules proposal should be part of the SGML revision.

Please forward comments to the author.