[This local archive copy is from the official and canonical URL, http://www.w3.org/TR/1999/NOTE-xml-canonical-req-19990307; please refer to the canonical source document if possible.]


W3C

XML Canonicalization Requirements

W3C Note 07-March-1999

This version:
http://www.w3.org/TR/1999/NOTE-xml-canonical-req-19990307
Latest version:
http://www.w3.org/TR/NOTE-xml-canonical-req
Editors:
James Tauber ()

Copyright © 1999 W3C MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.

Status of this document

The XML Syntax Working Group, with this 7 Mar 1999 publication, invites early feedback on requirements for work on XML Canonicalization. For background on this work, please see the XML Activity Statement.

This doesn't represent the working group's consensus on a finished document, but rather an early draft for public review. Please send comments to the editor.

This document is a NOTE made available by the W3 Consortium for discussion only. Publication as a Note does not imply endorsement by the W3C membership.

A list of current W3C technical reports and publications, including working drafts and notes, can be found at http://www.w3.org/TR.

Abstract

This document lists the design principles, scope and requirements for the Canonicalization of XML being developed by the World Wide Web Consortium's XML Syntax Working Group.

Table of Contents

1. Introduction
2. Design Principles and Scope
3. Requirements
4. References

1. Introduction

The XML 1.0 Recommendation [XML] describes the syntax of a class of data objects called XML documents. It is possible, however, for logically equivalent XML documents to differ in their physical representation. In particular, two equivalent XML documents may differ on such issues as physical (ie entity) structure, attribute ordering, character encoding and insignificant whitespace. This means that equivalence testing cannot be done at the byte level for arbitrary XML documents. Such equivalence testing is useful in a number of domains including digital signatures, checksums, version control and conformance testing.

Work has started elsewhere on the broader question of digital signatures in XML [IOTP-DSig, Brown-XML-DSig, DOMHASH]. The W3C has a forthcoming workshop on signed XML [DS-XML].

The Canonical XML specification aims to introduce a notion of equivalence between XML documents which can be tested at the syntactic level and, in particular, by byte-for-byte comparison. It shall describe the canonicalization of XML documents such that logically equivalent documents will have the same byte-for-byte representation. This form is referred to as the canonical form of the document.

2. Design Principles and Scope

  1. The specification for Canonical XML shall describe how to derive the canonical form of any XML document.
  2. Canonicalization shall reflect the logical structure of the XML document and not the physical (ie entity) structure
  3. The specification shall not consider the canonicalization of unparsed entities (although a canonical document may still reference them)
  4. The specification shall not consider the canonicalization of the document type declaration.
  5. Canonicalization shall be designed to be consistent with the XML Infoset [Infoset] and W3C I18N work [Char-Mod, Char-Req].
  6. The specification shall consider the canonicalization of documents that make use of namespaces

3. Requirements

  1. Every XML document shall have a unique canonical form.
  2. The canonical form of an XML document shall be a well-formed XML document
  3. Canonicalization shall produce byte-comparable forms of characters defined by Unicode [Unicode] to be equivalent
  4. The canonical form shall derivable from the information provided by the XML Information Set

4. References

XML
Extensible Markup Language (XML) Recommendation. http://www.w3.org/TR/REC-xml
Namespaces
Namespaces in XML Recommendation. http://www.w3.org/TR/REC-xml-names
Infoset-Req
XML Information Set Requirements Note. http://www.w3.org/TR/NOTE-xml-infoset-req
Char-Mod
Character Model for the World Wide Web Working Draft http://www.w3.org/TR/WD-charmod
Char-Req
Requirements for String Identity and Character Indexing Definitions for the WWW http://www.w3.org/TR/WD-charreq
Unicode
The Unicode Consortium. The Unicode Standard, Version 2.0. Reading, Mass.: Addison-Wesley Developers Press, 1996
IOTP-DSig
Internet Draft. Digital Signatures for the Internet Open Trading Protocol http://www.ietf.org/internet-drafts/draft-ietf-trade-iotp-v1.0-dsig-00.txt
Brown-XML-DSig
Internet Draft. Digital Signatures for XML http://search.ietf.org/internet-drafts/draft-brown-xml-dsig-00.txt
DOMHASH
Internet Draft. Digest Values for DOM (DOMHASH) http://search.ietf.org/internet-drafts/draft-hiroshi-dom-hash-01.txt
DS-XML
XML-DSig '99: The W3C Signed XML Workshop http://www.w3.org/1999/02/ds-xml-cfp-19990218.html