The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Created: February 05, 2004.
News: Cover StoriesPrevious News ItemNext News Item

Extensible Markup Language (XML) Version 1.1 Published as a W3C Recommendation.

The World Wide Web Consortium has published Extensible Markup Language (XML) 1.1 and Namespaces in XML 1.1 as W3C Recommendations. XML Version 1.1 has been updated since the first Recommendation issued in 1998 to addresses Unicode, control character, and line ending issues. "Whereas XML 1.0 provided a rigid definition of names wherein everything that was not permitted was forbidden, XML 1.1 names are designed so that everything that is not forbidden (for a specific reason) is permitted. Since Unicode will continue to grow past version 4.0, further changes to XML can be avoided by allowing almost any character, including those not yet assigned, in names." The Namespaces Version 1.1 Recommendation incorporates errata corrections and provides a mechanism to undeclare prefixes.

W3C also released XML Information Set (Second Edition) as a Recommendation. The Infoset specification provides a set of definitions for use in other specifications that need to refer to the information in an XML document; Version 1.1 "updates the Infoset to cover XML 1.1 and Namespaces 1.1, clarifies the consequences of certain kinds of invalidity, and corrects typographical errors." A W3C Recommendation was also published for Extensible Markup Language (XML) 1.0 (Third Edition). XML 1.0 third edition does not define a new version of XML, but "brings the XML 1.0 Recommendation up to date with second edition errata, and clarifies its use of RFC 2119 keywords like must, should and may."

Bibliographic Information

XML Version 1.1 and Version 1.0: What's What?

The XML 1.1 Recommendation contains a number of differentiating grammar productions and clarifying statements showing how XML version 1.1 and 1.0 documents are recognized (from a distance) and declaring how XML processors should respect the differences.

In the first place, a well-formed XML document like

        <greeting>Hello, world!</greeting>

is considered an XML version 1.0 document because it lacks an XML declaration specifying that it is XML version 1.1. Unmarked documents are thus understood to be XML Version 1.0. EBNF grammar productions 22 (and 26) formalize this rule: in XML 1.0, "XML documents SHOULD begin with an XML declaration which specifies the version of XML being used"; but "XML 1.1 documents MUST begin with an XML declaration which specifies the version of XML being used". See: [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? in XML version 1.1 REC, different than [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? in XML version 1.0 REC.

Section 5.1 "Validating and Non-Validating Processors" in XML 1.1 clarifies that "XML 1.1 processors MUST be able to process both XML 1.0 and XML 1.1 documents. Programs which generate XML SHOULD generate XML 1.0, unless one of the specific features of XML 1.1 is required."

The XML 1.1 Recommendation Section 2.8 "Prolog and Document Type Declaration" specifies that "XML 1.1 processors SHOULD accept XML 1.0 documents as well. If a document is well-formed or valid XML 1.0, and provided it does not contain any control characters in the range [#x7F-#x9F] other than as character escapes, it may be made well-formed or valid XML 1.1 respectively simply by changing the version number."

The XML 1.1 Recommendation Section 4.3.4 "Version Information in Entities" says: "Each entity, including the document entity, can be separately declared as XML 1.0 or XML 1.1. The version declaration appearing in the document entity determines the version of the document as a whole. An XML 1.1 document may invoke XML 1.0 external entities, so that otherwise duplicated versions of external entities, particularly DTD external subsets, need not be maintained. However, in such a case the rules of XML 1.1 are applied to the entire document. If an entity (including the document entity) is not labeled with a version number, it is treated as if labeled as version 1.0."

John Cowan said (XML-DEV): "Nattering nabobs of negativism will doubtless be glad to note that XML 1.1 parsers MUST support XML 1.0 as well, and that human and mechanical XML generators SHOULD generate XML 1.0 unless there is a specific reason to generate XML 1.1..."

Rationale for XML Version 1.1

The W3C's XML 1.0 Recommendation was first issued in 1998, and despite the issuance of many errata culminating in a Third Edition of 2004, has remained (by intention) unchanged with respect to what is well-formed XML and what is not. This stability has been extremely useful for interoperability. However, the Unicode Standard on which XML 1.0 relies for character specifications has not remained static, evolving from version 2.0 to version 4.0 and beyond. Characters not present in Unicode 2.0 may already be used in XML 1.0 character data. However, they are not allowed in XML names such as element type names, attribute names, enumerated attribute values, processing instruction targets, and so on. In addition, some characters that should have been permitted in XML names were not, due to oversights and inconsistencies in Unicode 2.0.

The overall philosophy of names has changed since XML 1.0. Whereas XML 1.0 provided a rigid definition of names, wherein everything that was not permitted was forbidden, XML 1.1 names are designed so that everything that is not forbidden (for a specific reason) is permitted. Since Unicode will continue to grow past version 4.0, further changes to XML can be avoided by allowing almost any character, including those not yet assigned, in names.

In addition, XML 1.0 attempts to adapt to the line-end conventions of various modern operating systems, but discriminates against the conventions used on IBM and IBM-compatible mainframes. As a result, XML documents on mainframes are not plain text files according to the local conventions. XML 1.0 documents generated on mainframes must either violate the local line-end conventions, or employ otherwise unnecessary translation phases before parsing and after generation. Allowing straightforward interoperability is particularly important when data stores are shared between mainframe and non-mainframe systems (as opposed to being copied from one to the other). Therefore XML 1.1 adds NEL (#x85) to the list of line-end characters. For completeness, the Unicode line separator character, #x2028, is also supported.

Finally, there is considerable demand to define a standard representation of arbitrary Unicode characters in XML documents. Therefore, XML 1.1 allows the use of character references to the control characters #x1 through #x1F, most of which are forbidden in XML 1.0. For reasons of robustness, however, these characters still cannot be used directly in documents. In order to improve the robustness of character encoding detection, the additional control characters #x7F through #x9F, which were freely allowed in XML 1.0 documents, now must also appear only as character references. (Whitespace characters are of course exempt.) The minor sacrifice of backward compatibility is considered not significant. Due to potential problems with APIs, #x0 is still forbidden both directly and as a character reference.

...XML 1.1 defines a set of constraints called 'full normalization' on XML documents, which document creators SHOULD adhere to, and document processors SHOULD verify. Using fully normalized documents ensures that identity comparisons of names, attribute values, and character content can be made correctly by simple binary comparison of Unicode strings.

A new XML version, rather than a set of errata to XML 1.0, is being created because the changes affect the definition of well-formed documents. XML 1.0 processors must continue to reject documents that contain new characters in XML names, new line-end conventions, and references to control characters. The distinction between XML 1.0 and XML 1.1 documents is indicated by the version number information in the XML declaration at the start of each document..." [adapted from the W3C Recommendation 2004-02-04]

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: