Summary of the XML Family of W3C Languages


Airi Salminen

28 March 2001


This version:

Latest version:

Previous version:



1. Introduction

2. XML

3. XML Accessories

4. XML Transducers

5. XML Applications

About this report

1. Introduction

XML is a markup language for presenting information as structured documents. The language has been developed from SGML as an activity of the World Wide Web Consortium (W3C). Within W3C there is going on a number of other XML-related language development activities where the intent is to specify syntactic and semantic rules either for some specific kind of XML data or for data to be used together with XML data for a specific purpose. In this report the term XML family of W3C languages refers to XML and those XML-related languages. The purpose of the report is to give a concise overview of the current state of the development of the languages.


Results of W3C development activities are published as W3C Technical Reports. The process of developing technical reports is described in the W3C Process Document. This summary is based on the analysis of current technical reports of four types: Working Drafts, Candidate Recommendations, Proposed Recommendations, and Recommendations. The four types differ in their maturity from lower to higher:


·        A Working Draft (WD) represents work in progress, it is a draft document and may be updated, replaced or obsoleted by other document any time.


·        A Candidate Recommendation (CD) has received significant review from its immediate technical community. The document is an explicit call for implementation and technical feedback.


·        A Proposed Recommendation (PR) represents consensus within the group that produced it and has been proposed by the Director to the Advisory Committee for review.


·        A Recommendation (R) represents consensus within W3C. W3C makes every effort to maintain its Recommendations (e.g., by tracking errata, providing testbed applications, helping to create test suites, etc.) and to encourage widespread implementation. The practice in W3C is to collect all known errors in a Recommendation into an errata document referred to in the Recommendation.


In this summary  the XML family of W3C languages has been divided into four groups: XML, XML Accessories, XML Transducers, and XML Applications.


XML Accessories are languages which are intended for wide use to extend the capabilities specified in XML. Examples of XML accessories are the XML Schema language extending the definition capability of XML DTDs and  the XML Names extending the naming mechanism to allow in a single XML document element and attribute names that are defined for and used by multiple software modules.


XML Transduces are languages which are intended for transducing some input XML data into some output form. Examples of XML transducers are the style sheet languages CSS2 and XSL intended to produce an external presentation from some XML data and XSLT intended for transforming XML documents into other XML documents. A transducer language is associated with some kind of processing model which defines the way output is derived from input.


XML Applications are languages which define constraints for a class of XML data for some special application area, often by means of a DTD. Examples of XML applications are MathML defined for mathematical data or SMIL intended for multimedia documents.


Each of the following sections introduces the languages of one of the four groups. In the tables listing the languages of a group there are links to the documents describing the languages at the date of this summary. As a reminder of the emergent nature of the W3C specifications and their continuing redevelopment, the links to Recommendations (R) are associated with links to their errata documents. It has to be noticed that  all specifications described by Working Drafts (WD) are work in progress and any changes in them may happen.


[Introduction |  XML |  XML Accessories |  XML Transducers |  XML Applications]


2. XML


The XML development started in 1996. The first W3C Recommendation for XML 1.0 was published in February 1998, the second in October 2000. The Second Edition of XML 1.0 incorporates the changes dictated by the first edition errata. The second edition does not specify a new version of XML. Table 2 includes links to the XML 1.0 specification documents and also to those W3C documents which describe an abstract model for XML documents.


Table 1. XML



Document, Phase (R, PR, CR, WD), Month, Year


Structured Documents

- Extensible Markup Language (XML) 1.0, R, Feb. 1998

XML 1.0 Specification Errata

- Extensible Markup Language (XML) 1.0 (Second Edition), R, Oct. 2000

XML 1.0 Second Edition Specification Errata


Abstract models for XML documents:

- XML Information Set, WD, March 2001

- XML Path Language (XPath) Version 1.0, R, Nov. 1999

- Document Object Model (DOM) Level 1  Specification Version 1.0,  R, Oct. 1998

- Document Object Model (DOM) Level 2 Core Specification Version 1.0, R, Nov. 2000

- XML Query Data Model, WD, Feb. 2001


The XML specifications describe the concrete syntax of XML documents, and partially the behaviour of an XML processor, i.e., a software module used to read XML documents and to provide access to their content and structure. There are four abstract models for information available in XML documents.


·        The XML Information Set specification defines an abstract data set called XML Information Set (Infoset). The definitions in the specification are intended for other specifications that need to refer to information in a well-formed XML document.


·        The XPath Data Model is included in the XML Path Language (XPath) specification to allow the specification of addressing parts of an XML document.


·        DOM (Document Object Model) is an application programming interface for XML  and HTML documents. It defines the way data in a document is structured, accessed and manipulated. The DOM Level 1 Specification was published in 1998, the DOM Level 2 specifications published in November 2000 extend and update the Level 1 specification. The Level 2 consists of five parts: Core, Views, Events, Style, and Traversal and Range. The underlining data structure of XML documents is in the Core specification.


·        The XML Query Data Model development is part of the W3C activities for specifying an XML query language and it is work in progress. The Query Data Model is intended to define formally the information contained in the input to an XML Query processor.


All of the four models describe an XML document as a tree structure but there are differences in the trees and in the information available in the trees. The XPath, DOM Level 1, and DOM Level 2 specifications are W3C Recommendations, the other two are work in progress.


XML is intended to be universal format for data on the Web. To support references to Internet resources and the use of different character sets and languages, the XML specification uses a set of sublanguages specified by other development authorities than W3C. They are languages for describing characters, names of characters sets, names of languages, country codes, and for identifying Internet resources. These sublanguages are listed in Table 2. As a joint W3C and Unicode Consortium activity there is work in progress for developing guidelines on the use of the Unicode Version 3.0 in conjunction with markup like XML (See Unicode in XML and other Markup Languages Unicode Technical Report #20, W3C Note 15 December 2000)


Table 2. The Basis for XML



Developing Organization, Year



ISO/IEC 10646

ISO/IEC 10646-2000

Describing characters in different natural languages of the world

The Unicode Consortium, 1996

The Unicode Consortium, 2000

ISO, 1993 + amendments

ISO, 2000


Denoting character sets



ISO 639

ISO 3166

Denoting languages and countries

IETF, 1995

ISO, 1988

ISO, 1997



Identifying Internet resources

IETF, 1998

IETF, 1999

IANA = Internet Assigned Numbers Authority

ISO = International Organization for Standardization

IETF = Internet Engineering Task Force


[Introduction |  XML |  XML Accessories |  XML Transducers |  XML Applications]


3. XML Accessories

Table 3 lists the current XML accessories in the order of the maturity of their specifications. Three of the languages are at the moment described by W3C Recommendations: XML Names, XPath, and the language for xml-stylesheet processing instructions, which is intended for specifying associated style sheets by processing instructions in the prolog of an XML document. XLink and XML Base specifications are labelled as Proposed Recommendations, XPointer and XML Schema as Candidate Recommendations. Work in progress concerns how to specify the “style” attribute introduced in HTML.


XML Names is intended to facilitate the use of qualified element and attribute names in XML documents, in order to prevent name collisions. A qualified name consists of two parts: a namespace name as a prefix and a local part. The namespace name is identified by a URI reference. XML Names is used as an extension of XML in most other specifications of the XML family. XPath defines how to address parts in XML documents. In support of this primary purpose it also provides basic facilities for manipulation of strings, numbers, and booleans. The development of the second version of XPath has started by the requirements description. Among the accessories XPointer uses XPath as its component. XLink is intended for description and creation of links between Internet resources. The links can be simple unidirectional links similar to HTML, as well as relationships among more than two resources. Links can also reside in a location separate from the linked resources, and they can be associated with metadata.  XML Base provides a base URI service for XLink.   The purpose of the service is to resolve relative URIs in links to external  resources like images, applets, form-processing programs, and style sheets. XPointer defines fragment identifiers for URI references. It is built on top of the XPath language. XPointer extends XPath to allow addressing points and ranges as well as whole nodes, locating information by string matching, and using addressing expressions in URI references as fragment identifiers.


XML Schema extends the definition capabilities of XML, in particular, it allows the use of a variety of data types, e.g. boolean, float, int, date, and their validation in conforming software. XML Schema is intended to constrain XML documents but the schemas themselves are not necessarily written in XML. There is however an XML notation for the schema language. Three levels of conformance for schema aware processors are defined: minimally conforming processors, conformance to the XML representation of schemas, and fully conforming processors.


Table 3. XML Accessories



Document, Phase (R, PR, CR, WD), Month, Year

XML Names

Qualifying element and attribute names

Namespaces in XML, R, Jan. 1999

Errata for Namespaces in XML


Addressing parts of an XML document

- XML Path Language (XPath) Version 1.0, R, Nov. 1999

XPath Version 1.0 Specification Errata

- XPath Requirements Version 2.0, WD, Feb. 2001

xml-stylesheet processing instruction

Specifying associated style sheets

Associating Style Sheets with XML documents Version 1.0, R, June 1999

Errata for "Associating Style Sheets with XML documents Version 1.0"


To create and describe links

XML Linking Language (XLink) Version 1.0, PR, Dec. 2000

XML Base

A base URI service for XLink

XML Base, PR, Dec. 2000



Fragment identifiers for URI


XML Pointer Language (XPointer) Version 1.0, CR, June 2000

XML Schema

Constraining of a class of XML documents

- XML Schema Part 0: Primer, PR,  March 2001   (non-normative decription)

- XML Schema Part 1: Structures, PR, March 2001

- XML Schema Part 2: Datatypes, PR, March 2001

- XML Schema: Formal Description, WD, March 2001

“style” attribute

Syntax to be used in the “style” attribute

Syntax of CSS rules in HTML's "style" attribute, WD, March 2001


[Introduction |  XML |  XML Accessories |  XML Transducers |  XML Applications]


4. XML Transducers


Table 4 lists the XML transducer languages. They include languages for rendering (CSS and XSL), transformation (XSLT), canonicalization (Canonical XML),  fragment interchange (XML Fragment Interchange), merging (XInclude), and querying. CSS is a language for specifying style sheets for any structured documents. In developing CSS2 XML as a notation for structured documents was taken especially into account. CSS Mobile Profiles specifies a subset of CSS2 to be used for mobile devices. The goal in CSS3 is to create a modularized CSS specification. XSL is a style sheet language especially designed for XML documents. It uses XML syntax for style sheets. XSL contains the transformation language XSLT as its component. XSLT can be used also independently of XSL to describe transformations of XML documents.


Canonical XML defines a process to create a specified physical representation, a canonical form,  to an XML document or a document subset. The process is called canonicalization. XML Fragment Interchange language includes capabilities to specify a part of a whole XML document as a fragment to be sent to a receiver. XInclude is a language for specifying merging of a set of XML documents represented as Infosets to a new Infoset. XQuery is the W3C query language under development for XML data. It is  based on the XML Query Algebra and XML Query Data Model.


Table 4. XML Transducers



Document, Phase (R, PR, CR, WD), Month, Year



- Cascading Style Sheets, level 2 CSS2 Specification, R, May 1999

Errata in REC-CSS2-19980512

- CSS Mobile Profile 1.0, WD, Oct. 2000

- CSS3 introduction, WD, April 2000

- User Interface for CSS3, WD, Feb 2000

- CSS3 module: W3C selectors, WD, Oct. 2000

- CSS3 module: Ruby, WD, Feb. 2001

- CSS3 module: Color, WD, March 2001

- Paged Media Properties for CSS3, WD, Sept. 1999

- CSS Namespace Enhancements (Proposal), WD, June 1999

- Color Profiles for CSS3, WD, June 1999

- Multi-column layout in CSS, WD, June 1999

- Behavioral Extensions to CSS, WD,  Aug 1999

- International Layout, WD,  Sept. 1999



- XSL Transformations (XSLT)

Version 1.0, R, Nov. 1999

XSL Transformations (XSLT) Version 1.0 Specification Errata

- XSL Transformations (XSLT) Version 1.1,

WD, Dec. 2000

- XSLT Requirements Version 2.0, WD, Feb. 2001

Canonical XML


Canonical XML Version 1.0, R, March 2001

Errata of the Canonical XML 1.0 Specification



Extensible Stylesheet Language (XSL) Version 1.0, CR, Nov. 2000

XML Fragment Interchange

Interchanging fragments

XML Fragment Interchange, CR,  Feb. 2001



XML Inclusions (XInclude) Version 1.0, WD, Oct. 2000




- XQuery: A Query Language for XML, WD, Feb. 2001

- XML Query Requirements, WD, Feb. 2001

- XML Query Data Model, WD, Feb. 2001

- The XML Query Algebra, WD, Feb. 2001

- XML Query Use Cases, WD, Feb. 2001


[Introduction |  XML |  XML Accessories |  XML Transducers |  XML Applications]


5. XML Applications


Languages developed or under development in W3C, and intended for XML documents on a very spesific application area are listed in Table 5. Four of the languages are described as W3C Recommendations at the moment: SMIL 1.0, RDF, MathML 1.01, and XHTML 1.0. SMIL is a language for integrating a set of independent multimedia objects into a syncronized multimedia presentation. It can be used to describe temporal behaviour, layout of the presentation on the screen, and links between media objects. Work in progress concerns the development of the second version of SMIL to support the reuse of SMIL syntax and semantics in other XML-based languages. RDF is a general model for the metadata describing Web resources. The concrete syntax of the RDF is given by XML and requires also the XML namespace facility. MathML is an XML application for describing mathematical notation. The goal of MathML is to eable encoding mathematical material for the Web. The second version has reached the Recommendation phase. XHMTL is a reformulation of HTML 4 in XML 1.0. The XHMTL specification is associtated with a set of other specifications supporting the modularized use of XHMTL.


RDF Schema, SVG 1.0, XML-Signature, and P3P are languages on the Candidate Recommendation phase. The RDF Schema language allows the use of RDF to describe RDF vocabularies, and especially to provide information about the interpretation of the statements given in an RDF data model. SVG  is a language for describing two-dimensional vector and mixed vector/raster graphics in XML. XML-Signature defines syntax and processing rules for XML digital signatures. It is intended to provide integrity, message authentication and signer authentication services for data, be it located within the XML that includes the signature or elsewhere. P3P stands for the Platform for Privacy Preferences and it enables Web sites to express their privacy practices in a standard format.


The rest of the languages of Table 5 are still work in progress. APPEL is a language for describing collections of preferences regarding P3P policies between P3P agents. It is intended to complement the P3P language. SMIL Animation defines an animation framework for XML documents. It is based upon the SMIL 1.0 timing model, with some extensions. XForms is intended for the specification of Web forms that can be used on a variety of platforms, for instance, on desktop computers, television sets, or cell phones. Ruby Annotation is a markup language for ruby, short runs of text alongside the base text, typically used in East Asian documents to indicate pronunciation or to provide a short annotation. CC/PP (Composite Capabilities/Preference Profiles) describes a framework for specifying how client devices express their capabilities and preferences to the server that originates content. To enable the use of speech on the web and the access to the Web using spoken interaction W3C is developing several markup languages. These languages do not have any names yet. Finally, XMLP (XML Protocol) is intended allow two or more peers to communicate in a distributed environment, using XML as its encapsulation language. For XMLP there is only a requirements specification draft available.


An extensive list of XML applications developed or under development also by other organizations than W3C is maintained by Robin Cover.


Table 5. XML Applications



Document, Phase (R, PR, CR, WD), Month, Year


Multimedia documents

- Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, R, June 1998

- Synchronized Multimedia Integration Language (SMIL 2.0) Specification, WD, March 2001


Metadata for  Web resources



Resource Description Framework (RDF) Model and Syntax Specification, R, Feb. 1999

Errata in REC-rdf-syntax-19990222


Mathematical notation

- Mathematical Markup Language (MathML™) 1.01 Specification, R, July 1999

- Mathematical Markup Language (MathML)

Version 2.0,  R,  Feb. 2001

Errata of the MathML 2 Specification


Reformulation of HTML 4.0 in XML


- XHTML™ 1.0: The Extensible HyperText Markup Language, R, Jan. 1999

XHTML™ 1.0 Specification Errata

- XHTML™ Basic, R, Dec. 2000

XHTML™ Basic Specification Errata

- Modularization of XHTML™, PR, Feb. 2001

- XHTML™ 1.1 - Module-based XHTML, WD, Jan. 2000

- Building XHTML™ Modules, WD, Jan. 2000

- XHTML™ Events. An updated events syntax for XHTML, WD, Aug. 2000

- Modularization of XHTML™ in XML Schema, WD, March 2001

RDF Schema

To describe RDF vocabularies

Resource Description Framework (RDF) Schema Specification 1.0, CR,  March 2000


Vector graphics

Scalable Vector Graphics (SVG) 1.0 Specification, CR,  Nov. 2000


Digital signatures

XML-Signature Syntax and Processing, CR, Oct. 2000


Privacy practices for Web sites

The Platform for Privacy Preferences 1.0 (P3P1.0) Specification, CR, Dec. 2000


Preferences regarding P3P policies

A P3P Preference Exchange Language 1.0 (APPEL 1.0), WD, Feb. 2001

SMIL Animation


SMIL Animation, WD, July 2000


Web forms

- XForms 1.0, WD, Feb. 2001

- XForms Requirements, WD, August 2000

- XForms 1.0: Data Model, WD, August 2000

Ruby Annotation

Markup for ruby

Ruby Annotation, WD, Feb. 2001


A format for how a client device tells an origin server about its user agent profile

- Composite Capabilities/ Preference Profiles: Requirements and Architecture, WD, July 2000

- Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies, WD, March 2001

- Composite Capabilities/ Preference Profiles: Terminology and Abbreviations, WD, July 2000

No name yet

Voice markup, to enable access to the Web using spoken interaction

- Model Architecture for Voice Browser Systems, WD, Dec. 1999

- Introduction and Overview of W3C Speech Interface Framework, WD, Dec. 2000

- Speech Synthesis Markup Requirements for Voice Markup Languages, WD, Dec. 1999

- Natural Language Processing Requirements for Voice Markup Languages, WD, Dec. 1999

- Grammar Representation Requirements for Voice Markup Languages, WD, Dec. 1999

- Dialog Requirements for Voice Markup Languages, WD, Dec. 1999

- Reusable Dialog Requirements for Voice Markup Language, WD, April 2000

- Multimodal Requirements for Voice Markup Languages, WD, July 2000

- Speech Recognition Grammar Specification for the W3C Speech Interface Framework, WD, Jan. 2001

- Speech Synthesis Markup Language Specification for the Speech Interface Framework, WD, Aug. 2000

- Natural Language Semantics Markup Language for the Speech Interface Framework, WD, Nov. 2000

- Stochastic Language Models (N-Gram) Specification, WD, Jan. 2001

- Pronunciation Lexicon Markup Requirements for the W3C Speech Interface Framework, WD, March 2001


Application messaging

XML Protocol (XMLP) Requirements, WD, March 2001


About this report

This report has been created as part of the X Group activities at the University of Waterloo in Canada. Preliminary versions were developed in the inSGML project at the University of Jyväskylä in Finland. The work has been supported by the grant 48989 of the Academy of Finland. Please report errors in this document or other comments to



[Introduction |  XML |  XML Accessories |  XML Transducers |  XML Applications]