[Archive copy mirrored from the canonical http://www.textuality.com/xml/namespace.htm, text only]

Options for Implementing Namespaces in XML

Tim Bray
tbray@textuality.com
Andrew Layman
andrewl@microsoft.com

The XML ERB is considering several proposals for the "namespaces problem." This document very briefly summarizes each, explains how it applies to a document instance (the data), how it applies to and/or depends on a DTD (a schema), and then shows an example of the proposal in application.

This document does not justify or motivate multiple namespaces, or show what criteria are needed for them, since this has been written in other documents. This one just contrasts the various proposed solutions.

Motivating Example

Our sample problem is a structure containing an order with a lineitem, within which there is a price, shipping zone and a digital signature. The elements are defined in several schemata (identified by URI): Orders, LineItem and Price come from " http://www.bigbookstore.com/schema"; Name and Author from "http://purl.org/dublincore"; Zone from "http://www.shipping.com" and the DSIG from "http://www.w3.org".

Though our example is small, it shows an important characteristic: Even a document that is just well-formed needs namespaces to precisely identify the meaning of its elements. For this reason our sample does not have a DTD, and contains terms defined in the Dublin Core, a well-known namespace that in fact does not have a DTD.

The following is the Basic Example, without any namespaces.

<XML>
        <ORDERS>
                <LINEITEM>
                        <PRICE>5.95</PRICE>
                        <ZONE>9</ZONE>
                        <DSIG><DIGEST>1234567890</DIGEST><SIGNER>AndrewL@microsoft.com</SIGNER></DSIG>
                </LINEITEM >
        </ORDERS>
</XML>

Overview and Summary

The proposals largely agree that the appropriate way to identify a namespace is with a URI. Since URIs are lengthy and require the use of characters that would be out of place within tags or attribute-names, some ancillary mechanism, involving a shorter namespace identifier, seems to be required at the individual element level.

Given this, there are the following outstanding issues:

It is important to note that to a large degree, the design decisions are orthogonal.

On the DTD front, the first open question is whether it should be required, or even allowed, to associate namespaces with DTDs, and if so, what special machinery to support this is required.

For declaring the associations, there are at least four options:

  1. Use an ordinary element and sub-elements. Example:
    <xml-namespace><ref>book.schema</ref><as>bk</as></xml-namespace>
  2. Use an ordinary element and attributes. Examples:
    <xml-namespace ref="book.schema" as="bk"/>
    <MyDecl xml-role="namespace" ref="book.schema" as="bk"/>
  3. Use a processing instruction
    <?xml-namespace ref="book.schema" as="bk" ?>
  4. Use new declaration syntax
    <!MODULE bk ( ) SYSTEM "book.schema"

For associating elements with namespaces, there are also three options on the table:

  1. A label inserted right in the tag:
    <bk:price>5.95</bk:price>
    <bk::price>5.95</bk::price>
  2. A reserved attribute, one for each namespace:
    <AnyOldElement bk="PRICE">5.95</AnyOldElement>
  3. A new marked-section like syntax.
    <!NS[bk[
    <price>5.95</price>
    ]]>

Some of the examples above may seem unduly cluttered down by attributes. It should be borne in mind that XML has a nice default-attribute-value mechanism, so that, where DTD machinery is available, the attributes could be omitted whenever the element name is sufficient to identify the namespace. This is illustrated in the detailed proposals below (see for example, the transformation shown in the Architectural Forms discussion).

Layman

DTD

This proposal is neutral on the matter of DTDs. It does does not require any DTDs (but neither does it preclude adding namespace facilities to DTDs).

Instance

Namespace import occurs in the XML document instance. It is independent of any DTD.

A namespace is imported into a scope by using the reserved xml:namespace element. This has two sub-elements: The universal name of the space, and the local name. The universal name is the same for all documents referring to the same namespace. The scope in which the namespace may be used is the parent element of the xml-namespace.

Elements have qualified names indicating their namespaces, first the local name of the space, then a colon, then the remainder of the name. The local name and colon may be omitted when an element has the same namespace as its immediately containing element. I.e. for elements with unqualified names, the namespace may be found by working up the tree to the first namespace-qualified ancestor.

<XML>
        <xml:namespace>
                <ref>http://www.bigbookstore.com/schema</ref>
                <as>bk</as>
        </xml:namespace>
        <xml:namespace>
                <ref>http://www.w3.org</ref>
                <as>w3</as>
        </xml:namespace>
        <bk:ORDERS>
                <xml-namespace>
                        <ref>http://purl.org/dublincore</ref>
                        <as>dc</as>
                </xml:namespace>
                <xml:namespace>
                        <ref>http://www.shipping.com</ref>
                        <as>sh</as>
                </xml:namespace>
                <LINEITEM>
                        <dc:NAME>Number, the Language of Science</dc:NAME>
                        <dc:AUTHOR>Dantzig</dc:AUTHOR>
                        <PRICE>5.95</PRICE>
                        <sh:ZONE>9</sh:ZONE>
                        <w3:DSIG><DIGEST>1234567890</DIGEST><SIGNER>AndrewL@microsoft.com</SIGNER></w3:DSIG>
                </LINEITEM >
        </bk:ORDERS>
</XML>

Thompson

DTD

This proposal is conceived based on an assumption that DTDs are available - see commentary following the example - but may not depend on that assumption to be viable.

Instance


<!doctype xyzzy [
<!entity % bbs SYSTEM "http://www.bigbookstore.com/schema">
<!entity % www SYSTEM "http://www.w3.org"> <!-- it seems unlikely
                                                 there is a schema
                                                 at this URL . . . -->
<!entity % dcore SYSTEM "http://purl.org/dublincore">
<!entity % ship SYSTEM "http://www.shipping.com"> <!-- ditto -->
<!ns[ bk [ %bbs <!-- presume this includes declarations for ORDER,
                       LINEITEM and PRICE --> ]]>
<!ns[ w3 [ %www <!-- presume this includes declarations for DSIG, DIGEST
                       and SIGNER --> ]]>
<!ns[ dc [ %dcore <!-- presume this includes declarations for NAME
                         and AUTHOR --> ]]>
<!ns[ sh [ %ship <!-- presume this includes declaration for ZONE --> ]]>
<!element xyzzy (. . .,(. . .|bk:ORDERS|. . .)*,. . .)>
. . .
]>
<xyzzy>
. . .
<bk:ORDERS>
 <LINEITEM>
  <dc:NAME>Number, the Language of Science</dc:NAME>
  <dc:AUTHOR>Dantzig</dc:AUTHOR>
  <PRICE>5.95</PRICE>
  <sh:ZONE>9</sh:ZONE>
  <w3:DSIG><DIGEST>1234567890</DIGEST>
    <SIGNER>AndrewL@microsoft.com</SIGNER>
  </w3:DSIG>
 </LINEITEM >
</bk:ORDERS>
. . .
</xyzzy>

(Thompson:) Now the good news is that this is well-formed, just like that. The bad news is it's only valid if a) my presumptions as recorded above in comments are correct; b) the content model for bk:LINEITEM is ANY.

The bottom line here is I see no way under my or anyone else's proposal to validate completely if you want to interpolate elements from a DTD fragment with namespace B into the content of elements from a DTD fragment with namespace A unless you can change the DTD with namespace A, in the example above that's the bigbookstore DTD.

Other differences:

Takahashi (Japanese Submission to the ISO SGML Committee)

DTD

A DTD is required, as is shown below, preceding the instance.

A namespace is imported into a scope inside a DTD, which single DTD is then used for an entire instance document. (This may require building custom DTDs per document instance.) This proposal suggests a new construction within DTD called "Module" which declares and invokes a namespace. Namespaces have a universal name and a local name (same meanings and use as in the Layman and Thompson proposals).

Within a DTD, names from a namespace are imported into other namespaces. Their scope is the whole of the module into which they are imported. Names cannot be disambiguated within a namespace; that is, qualification cannot be used within namespace definition to distinguish between two different imported terms with the same name.)

Instance

Elements have qualified names indicating their namespaces, first the local name of the space, then two colons, then the remainder of the name. The local name and colons may be omitted whenever an element name would be unambiguous, given the DTD.

Since the document instance is limited to a single DTD, the scope of namespaces within the instance is the entire document.

The principal differences between this proposal and the Layman and Thompson proposals are two: First, the Layman and Thompson proposals works on document instances, while this proposal works on DTDs. Second, the L & T proposals are more restrictive on when it is legal to omit name qualification (are safer against changes to schema).

<!DOCTYPE doc [
        <!MODULE bk ( ) SYSTEM "http://www.bigbookstore.com/schema" [
                <!EXPORT (Orders)>
                <!IMPORT (DSIG, Author, Price, Zone) >
                <!ELEMENT LineItem ((#PCDATA) ( ? | (Name | Author | Price | Zone )*)) >
                <!ELEMENT Orders (LineItem)* >
                <!ELEMENT Price (#PCDATA) >
        ]>
        <!MODULE w3 ( ) SYSTEM "http://www.w3.org" [
                <!EXPORT (DSIG)>
                <!ELEMENT Digest (#PCDATA) >
                <!ELEMENT Signer (#PCDATA) >
                <!ELEMENT DSIG (Digest, Signer)
        ]>
        <!MODULE dc ( ) SYSTEM "http://purl.org/dublincore" [
                <!EXPORT (Name, Author)>
                <!ELEMENT Name (#PCDATA) >
                <!ELEMENT Author (#PCDATA) >
        ]>
        <!MODULE sh ( ) SYSTEM "http://www.shipping.com" [
                <!EXPORT (Zone)>
                <!ELEMENT Zone (#PCDATA) >
        ]>
]>
<XML>
        <bk::ORDERS>
                <LINEITEM>
                        <dc::NAME>Number, the Language of Science</dc::NAME>
                        <dc::AUTHOR>Dantzig</dc::AUTHOR>
                        <PRICE>5.95</PRICE>
                        <sh::ZONE>9</sh:ZONE>
                        <w3::DSIG><DIGEST>1234567890</DIGEST><SIGNER>AndrewL@microsoft.com</SIGNER></w3::DSIG>
                </LINEITEM >
        </bk::ORDERS>
</XML>

CONCUR

SGML has a facility named CONCUR that allows multiple DTDs to be attached to the same document. The syntax, using our example, would be something like:
<(book)price>5.95</(book)price>

This proposal suffers from some problems:

  1. It requires that each DTD be able to parse the whole document, which is definitely a non-goal of the namespace problem.
  2. The elements from the different DTDs are not required to nest properly, so the document would no longer be well-formed in the XML sense.
  3. It requires that namespaces be mapped one-to-one to DTDs.

Architectural Forms

Several writers have suggested using the "Architectural Forms" facility, found in the SGML Extended Facilities Annex, and used heavily in HyTime, to achieve the goals of namespaces. James Clark shows how this might work.

DTD

This proposal does not require any DTD machinery; the declarations are embedded in Processing Instructions, and the linkage of elements to namespaces is accomplished just using attributes. However, the PI-based declaration mechanism is not intrinsic to the proposal - the declarations could be done with reserved elements or new DTD syntax as an any of the earlier proposals. Also, to avoid wasteful verbosity, it would be necessary to make fairly heavy use of the attribute-defaulting mechanisms, which requires attribute declarations, hence at least some DTD machinery.

Instance

(James Clark): If I were designing an architectural form mechanism that could work with just instances, I would probably do it something like:

<XML>
        <?xml-arch
                arch="IDN//www.bigbookstore.com//ARCH Book Orders//EN"
                form-att="bk"
        ?>
        <?xml-arch
                arch="IDN//www.w3.org//ARCH Digital Signatures 1.0//EN"
                form-att="w3"
        ?>
        <BOOK-ORDERS BK="ORDERS">
                <?xml-arch
                        arch="IDN//purl.org//ARCH Dublin Core//EN"
                        form-att="dc"
                ?>
                <?xml-arch arch="IDN//www.shipping.com//ARCH Shipping//EN"
                        form-att="sh"
                ?>
                <LINEITEM BK="LINEITEM">
                        <NAME DC="NAME">Number, the Language of Science</NAME>
                        <AUTHOR DC="AUTHOR">Dantzig</AUTHOR>
                        <PRICE BK="PRICE">5.95</PRICE>
                        <SHIPPING-ZONE SH="ZONE">9</SHIPPING-ZONE>
                        <DIGITAL-SIGNATURE W3="DSIG">
                                <DIGEST W3="DIGEST">1234567890</DIGEST>
                                <SIGNER W3="DIGEST">AndrewL@microsoft.com</SIGNER>
                        </DIGITAL-SIGNATURE>
                </LINEITEM>
        </BOOK-ORDERS>
</XML>

In fact I would always use a DTD subset to get something like this:

<!DOCTYPE XML [
        <?xml-arch
                arch="IDN//www.bigbookstore.com//ARCH Book Orders//EN"
                form-att="bk"
        ?>
        <?xml-arch
                arch="IDN//www.w3.org//ARCH Digital Signatures 1.0//EN"
                form-att="w3"
        ?>
        <?xml-arch
                arch="IDN//purl.org//ARCH Dublin Core//EN"
                form-att="dc"
        ?>
        <?xml-arch
                arch="IDN//www.shipping.com//ARCH Shipping//EN"
                form-att="sh"
        ?>
        <!ATTLIST BOOK-ORDERS BK NAME #FIXED "ORDERS">
        <!ATTLIST LINEITEM BK NAME #FIXED "ORDERS">
        <!ATTLIST NAME DC NAME #FIXED "NAME">
        <!ATTLIST AUTHOR DC NAME #FIXED "NAME">
        <!ATTLIST PRICE DC NAME #FIXED "PRICE">
        <!ATTLIST SHIPPING-ZONE SH NAME #FIXED "ZONE">
        <!ATTLIST DIGITAL-SIGNATURE W3 NAME #FIXED "DSIG">
        <!ATTLIST DIGEST W3 NAME #FIXED "DIGEST">
        <!ATTLIST SIGNER W3 NAME #FIXED "SIGNER">
]>
<XML>
        <BOOK-ORDERS>
                <LINEITEM>
                        <NAME>Number, the Language of Science</NAME>
                        <AUTHOR>Dantzig</AUTHOR>
                        <PRICE>5.95</PRICE>
                        <SHIPPING-ZONE>9</SHIPPING-ZONE>
                        <DIGITAL-SIGNATURE>
                                <DIGEST>1234567890</DIGEST>
                                <SIGNER>AndrewL@microsoft.com</SIGNER>
                        </DIGITAL-SIGNATURE>
                </LINEITEM>
        </BOOK-ORDERS>
</XML>

and I would also probably make use of the rules for defaulting the form attribute so I could instead do:

<!DOCTYPE XML [
        <?xml-arch
                arch="IDN//www.bigbookstore.com//ARCH Book Orders//EN"
                form-att="bk"
        ?>
        <?xml-arch
                arch="IDN//www.w3.org//ARCH Digital Signatures 1.0//EN"
                form-att="w3"
        ?>
        <?xml-arch
                arch="IDN//purl.org//ARCH Dublin Core//EN"
                form-att="dc"
        ?>
        <?xml-arch
                arch="IDN//www.shipping.com//ARCH Shipping//EN"
                form-att="sh"
        ?>
        <!ATTLIST SHIPPING-ZONE SH NAME #FIXED "ZONE">
        <!ATTLIST DIGITAL-SIGNATURE W3 NAME #FIXED "DSIG">
]>
<XML>
        <BOOK-ORDERS>
                <LINEITEM>
                        <NAME>Number, the Language of Science</NAME>
                        <AUTHOR>Dantzig</AUTHOR>
                        <PRICE>5.95</PRICE>
                        <SHIPPING-ZONE>9</SHIPPING-ZONE>
                        <DIGITAL-SIGNATURE>
                                <DIGEST>1234567890</DIGEST>
                                <SIGNER>AndrewL@microsoft.com</SIGNER>
                        </DIGITAL-SIGNATURE>
                </LINEITEM>
        </BOOK-ORDERS>
</XML>

Finally I would probably put the DTD in a separate file:

<!DOCTYPE XML SYSTEM "http://www.jclark.com/dtds/book-order.dtd">
<XML>
        <BOOK-ORDERS>
                <LINEITEM>
                        <NAME>Number, the Language of Science</NAME>
                        <AUTHOR>Dantzig</AUTHOR>
                        <PRICE>5.95</PRICE>
                        <SHIPPING-ZONE>9</SHIPPING-ZONE>
                        <DIGITAL-SIGNATURE>
                                <DIGEST>1234567890</DIGEST>
                                <SIGNER>AndrewL@microsoft.com</SIGNER>
                        </DIGITAL-SIGNATURE>
                </LINEITEM>
        </BOOK-ORDERS>
</XML>