Name Spaces in XML, W3C - Note-xml-names-19980119 from: http://www.w3.org/TR/1998/NOTE-xml-names-0119.xml '"> "> amp, lt, gt, apos, quot"> ]>
Name Spaces in XML Version 1.0 Note-xml-names-&iso6.doc.date; World Wide Web Consortium Note 19January1998

This draft is for public discussion.

http://www.textuality.com/xml/xml-names-&iso6.doc.date; Tim Bray Textuality and Netscape tbray@textuality.com Dave Hollander Hewlett-Packard Company dmh@corp.hp.com Andrew Layman Microsoft andrewl@microsoft.com

This document is a NOTE made available by the W3 Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE.

This work is part of the W3C XML Activity.

The XML WG solicits comments from W3C member companies and W3C working groups that use the namespace mechanism described in this Note. In particular, comments on open issues are very welcome, and should be sent to the editors.

XML Namespaces is a proposal for a simple method to be used for qualifying names used in Extensible Markup Language (XML) documents by associating them with schemas, identified by URI.

Created in electronic form.

English Extended Backus-Naur Form (formal grammar) 1997-10-10 : TB : Assembled Andrew's material and mine
Motivation and Summary

We envision applications of XML in which a document instance may contain markup defined in multiple schemas. These schemas may have been authored independently. One motivation for this is that writing good schemas is hard, so it is beneficial to reuse parts from existing, well-designed schemas. Another is the advantage of allowing search engines or other tools to operate over a range of documents that vary in many respects but use common names for common element types.

These considerations require that document constructs should have universal names, whose scope extends beyond their containing document. This specification proposes a mechanism, XML Namespaces, to accomplish this.

XML Namespaces are based on the use of qualified names, similar to those long used in programming languages. Names are permitted to contain a colon, separating the name into two parts, the namespace name and the local name. The namespace name identifies a schema's URI. The combination of the universally-managed URI namespace and the local schema namespace produces names that are guaranteed universally unique.

XML syntax does not allow direct use of a URI as a namespace name, because URIs can contain characters not allowed in names. Consequently, the namespace name serves as a proxy for a URI. A special processing instruction described below is used to declare the association of the namespace name with a URI; software which supports this namespace proposal must recognize and act on it.

Namespace Syntax Declaring Namespaces

A namespace is declared using a reserved processing instruction as follows: Namespace Declaration PI NamespacePI '<?xml:namespace' S 'name=' SystemLiteral S 'href=' SystemLiteral S 'as=' NSName S? '?>' NSName ' Name ' | " Name " The "name" SystemLiteral is a URI which uniquely identifies the namespace. The "href" SystemLiteral is an optional URI which may be used to retrieve the schema, if one is provided. Some namespaces need no schemas; this specification does not depend on their existence, or on the use of any particular machine- or human-readable syntax in the schema.

The NSName gives the namespace name which will be used as a link to associate names in an XML document with this schema. Examples of Namespace Declarations: ]]>

Placing Declarations in Documents

Namespace declarations must be located in the prolog of an XML document, after the XML Declaration (if any) and before the DTD (if any). This effectively makes the scope of namespace names global to the whole document, including the DTD. It also means that should a processor wish to insert its own qualified names, it need only read the namespace declarations from the prolog to be sure of generating a new, unique, namespace name.

To accomplish this, the production for prolog is replaced as follows: Prolog with Namespace Declarations prolog XMLDecl? S? NamespacePI* Misc* (doctypedecl Misc*)? Unique Namespace Names

No namespace name may be declared more than once.

Qualified Names

Within the document, some names (constructs corresponding to the nonterminal Name) are replaced by qualified names, defined as follows: Qualified Name QName (NSPart ':')? LocalPart NSPart Name LocalPart Name The NSPart provides the namespace name part of the qualified name, and may be associated with defining schema through the URI in the applicable namespace declaration.

The LocalPart provides the local name part of the qualified name.

Namespace Name Declared

The namespace name, unless it is "xml", must have been declared in a namespace declaration. The namespace name xml is reserved, and considered to have been implicitly declared.

Using Qualified Names

To enable the proper use of qualified names, it is necessary to banish colons from all Names which are not qualified; two productions are replaced as follows: Name Name (Letter | '_' ) (NameChar)* MiscName '.' | '-' | '_' | CombiningChar | Ignorable | Extender

Element types may be given as qualified names. To do this, the productions for start-, end-, and empty-element tags (STag, ETag, and EmptyElement) are replaced as follows: Start-tag STag '<' QName (S Attribute)* S? '>' ETag '</' QName S? '>' EmptyElement '<' QName (S Attribute)* S? '/>'

Attribute names are given as qualified names. To do this, the production for Attribute is replaced as follows: Attribute Attribute QName Eq AttValue

Examples Operational Scenarios Mathematical Expressions

I have to write a schema for manuals. As manuals have mathematical expressions, my schema has to allow them.

Fortunately, W3C has a schema called MathML. As I know little about mathematical expressions, I would like to use MathML as is. I do not even want to read what is defined in MathML. If a better schema for mathematical expressions appears later, I will switch to that schema.

Although I do not care about internal structures of mathematical expressions, I do want to restrict where they may appear. I do not allow them in footnotes. I only allow them as direct subordinates of sections and subsections.

Writers will use XML editors to edit manuals. In the near future, mathematical expression editors for MathML will show up in the market. My writers will use such editors to edit mathematical expressions in manuals. While editing manuals with XML editors, writers can introduce mathematical expressions, provided that my schema allows mathematical expressions there. Then, mathematical expression editors are automatically invoked. After creating mathematical expressions, writers close math editor windows, and resume editing in XML editors.

Observe that implementation of XML editors does not require implementation of mathematical expressions, and that mathematical expression editors are dedicated to MathML. Neither editor need know the entire document or schema.

Metadata

I have to write a schema for on-line novels. Because of some regulation, each novel has to have metadata. The schema of such metadata is already defined by somebody (the government, for example). I have to use that metadata schema as is. No change is allowed.

As in the previous case, I do not care about the internal structure of metadata. But I allow metadata to appear as the eldest child of the novel element only.

Writers write novels with XML editors. The editors use my schema. But writers do not provide metadata. Writers see no metadata.

Somebody examines each novel and then creates metadata. If that novel is a pornography, he or she will specify this information in the metadata. But the novel is not modified at all. The only change is metadata. Therefore, metadata editing tools do not provide editing of novels, but only provide metadata editing.

If the schema for metadata is revised by the government, I simply reference to the new schema. Writers do not rewrite novels. If the revision is backward-compatible, existing metadata (embedded within novels) remains valid. If not, metadata has to be edited manually or converted automatically.

Tables

My schema for manuals should provide tables. But I do not want to study columns and rows. I would like to use somebody's schema for tables. I simply refer to that schema.

Although I do not care about columns and rows, I do care permissible about subordinates of entries. I would like to allow data characters, phrase elements, and mathematical expressions only. Nothing else can appear within entries.

Editing of tables is similar to that of mathematical expresssions, but we need recursive editing. A writer edits a manual with an XML editor. When he or she introduces a table, a table editor is automatically invoked. The table editor is dedicated to tables, and does not use my schema. After creating rows, columns, and entries, the XML editor is recursively invoked to create subordinates of entries.

Syntax Example: The On-line Bookstore

Imagine an XML document representing an invoice for books. If public schemas exist for elements and attributes describing books, electronic transactions and digital signatures, the invoice author should be able to use these, rather than inventing new element and attribute types. Any reader of the invoice document should be able to infer a consistent meaning to its contents, the same meaning as if the elements and attributes had appeared in a different kind of document (such as an invoice for automotive parts, or an inventory of books or a digital signature on a legal contract). Any search tool should locate the elements, regardless of the document in which they reside. Further, since several schemas may choose the same name (e.g. "size") for elements or attributes with different meanings, these must be distinguished if used within the same document.

80183589575795589189518915 Layman Andrew 1997-03-17 ]]>
Issues Open for Discussion

The namespace syntax presented in this working draft is intended to support the namespace needs expressed by other W3C activities, to enable interoperability and to provide for future enhancements to the XML specification. Unfortunately, the syntax presented is not sufficiently robust to describe the blind interchange of validated documents which contain elements and attributes whose types are defined in several schemas. Therefore, to provide insight into the intent of the namespace syntax, this draft includes a brief summary of the SIG discussion and rationale. This section will not attempt to present detailed technical discussion nor will it document the individual contributions of those who participated in the discussion. These details are available in the W3C XML SIG Mail Archives (/http://lists.w3.org/Archives/Member/w3c-xml-sig/).

The namespace discussion has resulted in no changes to the XML 1.0 syntax; colons continue to be valid name characters. Our intent is to enable the development of namespace aware applications without adding large passages to the XML specification and without adding significant burden to XML application developers. In developing this proposal we have avoided several features and functions we believe will be included in the full namespace specification in a future release of the XML specification.

Specifically, this working draft does not establish semantics for validating document instances against multiple schemas, the mechanics for minimizing namespace names, address whether qualified attributes should be constrained, or if there should be constraints placed on other name characters. It is anticipated this proposal will promote the development of industry experience in regards to multiple schema validation, inheritance, sub-classing, editing and cut-and-pastepaste operation, and application behaviors that will be reflected in future versions of the XML specification.

This working draft does add constraints to the XML syntax by limiting the use of colons in names and establishing a convention for namespace declarations. It is the intent of these constraints to limit the namespace syntax sufficiently that future extensions can be defined to resolve remaining namespace issues. It is expected that legacy data conforming to this note will be compatible with these solutions. Note: there is no guarantee that any namespace mechanism will be adopted for XML, nor that the mechanism will in fact be compatible with the syntax described in this working draft.

Which Names Should be Subject to Qualification

The XML specification uses Name in the following contexts:

PI target

Root element type in doctype declaration

Element type in start-, empty-element, and end-tags, and in element type declarations

Attribute names, in start-tags, and in attribute list declarations

As the value of ID, IDREF(S), and ENTITY(IES) attributes (note that the values in NMTOKEN(S) attributes are NMTOKENS, not names)

Entity names, in declarations and references (general and parameter entities)

Notation names, in NDATA entity declarations and Notation declarations

(As LatinName) as the encoding name in an XML declaration

There is an open question as to which of these should exist in their qualified form. The instance of Name that is the first and least controversial candidate for qualification is the element type.

Existing SGML practice shows that attributes are often used for much the same purposes as elements, with the choice determined by evolutionary history and design aesthetics as often as by differences in element and attribute capabilities. Also, certain kinds of attributes are used on a wide range of element types (for example, those employed in XML Links, those that might indicate the datatype of an element, etc.). Thus, attribute names are a strong candidate for qualification.

Furthermore, on the basis of consistency and simplicity, it might be argued that if one instance of Name is to be qualified, all should be.

On the other hand, attributes are already qualified by element types; that is, permissible values, defaults, and semantics of attributes depend on element types. It has been argued that further qualification of attributes by namespaces is unnecessary, and is even harmful for validation (see 4.2). Those attributes (e.g., xml:lang) which apply to all element types should rather be captured by a different mechanism (e.g.,#ALL in the WebSGML adaptation), as their permissible values, defaults, and semantics do not depend on element types.

Validation

Perhaps the most complicated issue surrounding namespaces is validation. In the discussion, many viewed validation as any process that verified document instance against a schema while others often referred to validation as "plain old DTD-wise validation with an XML processor, not any other kind of validation with an application". This white paper tries to support both schema validation and XML validation where XML validation is a special case of schema validation in which an XML processor uses a schema expressed in SGML DTD syntax. Furthermore, it tries to support XML validation with the minimum of change to existing XML processors, and to provide for qualified names for both element types and attribute names.

For generic schema validation it is anticipated that applications will validate against a set of semantics that are either predefined in an application standard or expressed in machine readable syntax. The system literal in the namespace declaration should be used to identify the specific schema and any associated data resources.

The two most commonly discussed approaches to XML validation were fragment merging and validation of a grove of independent fragments. These methods differ in how the the validation process is implemented, and should yield the same result. This specification most directly supports validation by fragment merging. Fragment merging validates the document instance against a DTD created by merging elements from the constituent DTD fragments. The merged-DTD may be created either by man or machine, on the fly or prior to validation. This approach has the advantage that once the merged-DTD is created, the validation process is unchanged from current SGML/XML practice.

XML validation using a grove of independent fragments involves decomposing a document into fragments, each of which can be validated with a component schema. Each fragment is a subtree such that every element in it comes from a single schema. The entire document is valid with respect to the entire schema exactly when each fragment is valid with respect to the component schema.  This approach has the advantage that the validation process uses unmodified DTDs. However, it was troubling to realize that supporting qualified attribute names precludes using a grove of independent fragments, due to the fact that attributes in each fragment could be from a different namespace and therefore must be validated against a different schema.

In considering the validation issues, it became apparent to many that we have not arrived at an consensus about what it means, in the general case, to apply an attribute from one schema to an element from another schema. Coupled with a growing concern about detailed meaning of multiple schema validation and a desire to develop an XML based syntax to express a superset of the SGML DTD semantics, it was decided to support the most straight forward XML validation technique that supported the requirements and which will also foster an environment where application developers can evolve working models of multiple namespace validation with various schema syntaxes.

Qualified Names vs. Reserved Attribute

Two mechanisms were discussed to associate between elements and namespace schemas; qualified names and a reserved attribute. While many in the discussion admired the simplicity of the reserved-attribute approach which required no namespace declaration and no new syntax, the qualified namespace prefix syntax was chosen because it supports two requirements not possible with URI attributes. In addition, namespace qualifiers may be more compact, more meaningful to a reader, easier to understand, describe and use than a reserved attribute.

Qualified names support the requirement to identify the schema for an attribute and to be able to apply an attribute from one document fragment to an element from another document fragment. The use of URI attributes would not allow an attribute from one namespace to be applied to an element from a different namespace fragment. Additionally, many believe qualifiers are needed for other names; id, idrefs, and enumerated attribute values and perhaps should be permitted on all names. This qualified name syntax will not need to change if we decide to support additional name types in the future.

This working draft does not specify how the URI system literal in the Namespace Declaration PI is to be used, aside from saying that it identifies the namespace schema. We anticipate that validating XML processors will use the QName for validation. Other applications are free to choose whether and how the URI,LocalPart pair are interpreted. Note: a future version of the namespace specification that addresses schema validation semantics may require the interpretation of the URI, LocalPart pair to be formalized.

A validating XML processor must be able to disambiguate between QNames that have the same LocalParts. One intent is to support the ability to merge two document fragments that have contain same generic identifier while still performing XML validation against the original content models. This ability to differentiate may also ease the deployment of stylesheets and the development other processing applications. Finally, because applications need only to control the NSPart prefix, qualified names simplify avoiding namespace collisions.

Namespace Declaration

The syntax presented in this working draft uses a reserved processing instruction (PI) to associate a namespace qualifier with a network resource. In the discussions many alternatives were offered, including using a new declaration type, system notations or XLL links. While many considered PIs as the least desirable long term option, it was chosen for this draft because it is the most compatible with existing processing systems and should prove easy to migrate to another syntax in the future.

The ability to associate support multiple schemas with a name space qualifier was discussed. The multiple associations could provide both machine and human readable schemas or even multiple machine schemas. The namespace syntax in this working draft requires that there be exactly one namespace declaration PI for each namespace name. Future namespace specifications may establish conventions which support multiple associations as well as a mechanism to qualify a particular attribute of a specific element in a namespace.

Qualified names

The namespace syntax presented in this working draft severely constrains qualified names. The following constraints were chosen based on an understanding of the immediate requirements of other W3C actives and to reserve syntactical constructs to support extensions of the namespace mechanisms.

There is at most one colon per name. This working draft does not support the use of multiple colons to designate hierarchical namespaces, multiple inheritance or other semantics. It is anticipated that multi-part names will be discussed as mechanics to support implementing these features in a future version of the XML specification.

QNames use a single colon as a separator. The use of a double colon was discussed as a way to improve compatibility with the CSS specification even though most people preferred the single colon from a subjective and aesthetic perspective. After further investigation it was not shown that double colons would actually simplify the development of CSS compatible XML namespace aware applications. The single colon QName syntax used in this working draft is intended to support the use of CSS stylesheets in namespace aware XML applications.

The NSPart must not be empty. An empty NSPart could signal a processing semantic such as minimization. In this working draft, name parts must not be null for a document to be well formed. Future versions of the XML specification may establish the use empty QNames as a signal for namespace minimization.

Acknowledgements

This work reflects input from a very large number of people, including especially the members of the World Wide Web Consortium XML Working Group and Special Interest Group.

In particular, Murata Makoto contributed the operational scenarios in the examples section.