28 March 2001
This version: http://www.cs.jyu.fi/~airi/xmlfamily-20010328.html
Latest version: http://www.cs.jyu.fi/~airi/xmlfamily.html
Previous version: http://www.cs.jyu.fi/~airi/xmlfamily-20010228.html
Content
XML is a markup language for presenting information as structured documents. The language has been developed from SGML as an activity of the World Wide Web Consortium (W3C). Within W3C there is going on a number of other XML-related language development activities where the intent is to specify syntactic and semantic rules either for some specific kind of XML data or for data to be used together with XML data for a specific purpose. In this report the term XML family of W3C languages refers to XML and those XML-related languages. The purpose of the report is to give a concise overview of the current state of the development of the languages.
Results of W3C development activities are published as W3C Technical Reports. The process of developing technical reports is described in the W3C Process Document. This summary is based on the analysis of current technical reports of four types: Working Drafts, Candidate Recommendations, Proposed Recommendations, and Recommendations. The four types differ in their maturity from lower to higher:
· A Working Draft (WD) represents work in progress, it is a draft document and may be updated, replaced or obsoleted by other document any time.
· A Candidate Recommendation (CD) has received significant review from its immediate technical community. The document is an explicit call for implementation and technical feedback.
· A Proposed Recommendation (PR) represents consensus within the group that produced it and has been proposed by the Director to the Advisory Committee for review.
· A Recommendation (R) represents consensus within W3C. W3C makes every effort to maintain its Recommendations (e.g., by tracking errata, providing testbed applications, helping to create test suites, etc.) and to encourage widespread implementation. The practice in W3C is to collect all known errors in a Recommendation into an errata document referred to in the Recommendation.
In this summary the XML family of W3C languages has been divided into four groups: XML, XML Accessories, XML Transducers, and XML Applications.
XML Accessories are languages which are intended for wide use to extend the capabilities specified in XML. Examples of XML accessories are the XML Schema language extending the definition capability of XML DTDs and the XML Names extending the naming mechanism to allow in a single XML document element and attribute names that are defined for and used by multiple software modules.
XML Transduces are languages which are intended for transducing some input XML data into some output form. Examples of XML transducers are the style sheet languages CSS2 and XSL intended to produce an external presentation from some XML data and XSLT intended for transforming XML documents into other XML documents. A transducer language is associated with some kind of processing model which defines the way output is derived from input.
XML Applications are languages which define constraints for a class of XML data for some special application area, often by means of a DTD. Examples of XML applications are MathML defined for mathematical data or SMIL intended for multimedia documents.
Each of the following sections introduces the languages of one of the four groups. In the tables listing the languages of a group there are links to the documents describing the languages at the date of this summary. As a reminder of the emergent nature of the W3C specifications and their continuing redevelopment, the links to Recommendations (R) are associated with links to their errata documents. It has to be noticed that all specifications described by Working Drafts (WD) are work in progress and any changes in them may happen.
[Introduction | XML | XML Accessories | XML Transducers | XML Applications]
The XML development started in 1996. The first W3C Recommendation for XML 1.0 was published in February 1998, the second in October 2000. The Second Edition of XML 1.0 incorporates the changes dictated by the first edition errata. The second edition does not specify a new version of XML. Table 2 includes links to the XML 1.0 specification documents and also to those W3C documents which describe an abstract model for XML documents.
Table 1. XML
Language |
Purpose |
Document, Phase (R, PR,
CR, WD), Month, Year |
XML |
Structured Documents |
- Extensible Markup Language
(XML) 1.0, R, Feb. 1998 - Extensible Markup Language
(XML) 1.0 (Second Edition), R, Oct. 2000 XML 1.0 Second Edition
Specification Errata Abstract models for XML
documents:
- XML Information Set,
WD, March 2001 - XML Path Language (XPath)
Version 1.0, R, Nov. 1999 - Document Object
Model (DOM) Level 1 Specification
Version 1.0, R, Oct. 1998 - Document
Object Model (DOM) Level 2 Core Specification Version 1.0, R, Nov. 2000 |
The XML specifications describe the concrete syntax of XML documents, and partially the behaviour of an XML processor, i.e., a software module used to read XML documents and to provide access to their content and structure. There are four abstract models for information available in XML documents.
· The XML Information Set specification defines an abstract data set called XML Information Set (Infoset). The definitions in the specification are intended for other specifications that need to refer to information in a well-formed XML document.
· The XPath Data Model is included in the XML Path Language (XPath) specification to allow the specification of addressing parts of an XML document.
· DOM (Document Object Model) is an application programming interface for XML and HTML documents. It defines the way data in a document is structured, accessed and manipulated. The DOM Level 1 Specification was published in 1998, the DOM Level 2 specifications published in November 2000 extend and update the Level 1 specification. The Level 2 consists of five parts: Core, Views, Events, Style, and Traversal and Range. The underlining data structure of XML documents is in the Core specification.
· The XML Query Data Model development is part of the W3C activities for specifying an XML query language and it is work in progress. The Query Data Model is intended to define formally the information contained in the input to an XML Query processor.
All of the four models describe an XML document as a tree structure but there are differences in the trees and in the information available in the trees. The XPath, DOM Level 1, and DOM Level 2 specifications are W3C Recommendations, the other two are work in progress.
XML is intended to be universal format for data on the Web. To support references to Internet resources and the use of different character sets and languages, the XML specification uses a set of sublanguages specified by other development authorities than W3C. They are languages for describing characters, names of characters sets, names of languages, country codes, and for identifying Internet resources. These sublanguages are listed in Table 2. As a joint W3C and Unicode Consortium activity there is work in progress for developing guidelines on the use of the Unicode Version 3.0 in conjunction with markup like XML (See Unicode in XML and other Markup Languages Unicode Technical Report #20, W3C Note 15 December 2000)
Table 2. The Basis for XML
Language |
Purpose |
Developing Organization,
Year |
Unicode Unicode3 ISO/IEC 10646 ISO/IEC 10646-2000 |
Describing characters in different natural
languages of the world |
The Unicode Consortium,
1996 The Unicode Consortium,
2000 ISO, 1993 + amendments ISO, 2000 |
IANA-CHARSET |
Denoting character sets |
IANA |
IETF
RFC 1766 ISO
639 ISO 3166 |
Denoting languages and countries |
IETF, 1995 ISO, 1988 ISO, 1997 |
IETF RFC 2396 IETF RFC 2732 |
Identifying Internet resources |
IETF, 1998 IETF, 1999 |
IANA = Internet Assigned Numbers Authority
ISO = International Organization for Standardization
IETF = Internet Engineering Task Force
[Introduction | XML | XML Accessories | XML Transducers | XML Applications]
Table 3 lists the current XML accessories in the order of the maturity of their specifications. Three of the languages are at the moment described by W3C Recommendations: XML Names, XPath, and the language for xml-stylesheet processing instructions, which is intended for specifying associated style sheets by processing instructions in the prolog of an XML document. XLink and XML Base specifications are labelled as Proposed Recommendations, XPointer and XML Schema as Candidate Recommendations. Work in progress concerns how to specify the “style” attribute introduced in HTML.
XML Names is intended to facilitate the use of qualified element and attribute names in XML documents, in order to prevent name collisions. A qualified name consists of two parts: a namespace name as a prefix and a local part. The namespace name is identified by a URI reference. XML Names is used as an extension of XML in most other specifications of the XML family. XPath defines how to address parts in XML documents. In support of this primary purpose it also provides basic facilities for manipulation of strings, numbers, and booleans. The development of the second version of XPath has started by the requirements description. Among the accessories XPointer uses XPath as its component. XLink is intended for description and creation of links between Internet resources. The links can be simple unidirectional links similar to HTML, as well as relationships among more than two resources. Links can also reside in a location separate from the linked resources, and they can be associated with metadata. XML Base provides a base URI service for XLink. The purpose of the service is to resolve relative URIs in links to external resources like images, applets, form-processing programs, and style sheets. XPointer defines fragment identifiers for URI references. It is built on top of the XPath language. XPointer extends XPath to allow addressing points and ranges as well as whole nodes, locating information by string matching, and using addressing expressions in URI references as fragment identifiers.
XML Schema extends the definition capabilities of XML, in particular, it allows the use of a variety of data types, e.g. boolean, float, int, date, and their validation in conforming software. XML Schema is intended to constrain XML documents but the schemas themselves are not necessarily written in XML. There is however an XML notation for the schema language. Three levels of conformance for schema aware processors are defined: minimally conforming processors, conformance to the XML representation of schemas, and fully conforming processors.
Table 3. XML Accessories
Language |
Purpose |
Document, Phase (R, PR, CR, WD), Month, Year |
XML Names |
Qualifying
element and attribute names |
|
XPath |
Addressing
parts of an XML document |
- XML Path Language (XPath)
Version 1.0, R, Nov. 1999 |
xml-stylesheet processing instruction |
Specifying
associated style sheets |
Associating
Style Sheets with XML documents Version 1.0, R, June 1999 Errata
for "Associating Style Sheets with XML documents Version 1.0" |
XLink |
To
create and describe links |
|
XML Base |
A
base URI service for XLink |
|
XPointer |
Fragment
identifiers for URI references
|
|
XML Schema |
Constraining
of a class of XML documents |
- XML Schema Part 0:
Primer, PR, March 2001 (non-normative decription) - XML Schema Part 1:
Structures, PR, March 2001 |
“style” attribute |
Syntax
to be used in the “style” attribute |
Syntax of CSS
rules in HTML's "style" attribute, WD, March 2001 |
[Introduction | XML | XML Accessories | XML Transducers | XML Applications]
Table 4 lists the XML transducer languages. They include languages for rendering (CSS and XSL), transformation (XSLT), canonicalization (Canonical XML), fragment interchange (XML Fragment Interchange), merging (XInclude), and querying. CSS is a language for specifying style sheets for any structured documents. In developing CSS2 XML as a notation for structured documents was taken especially into account. CSS Mobile Profiles specifies a subset of CSS2 to be used for mobile devices. The goal in CSS3 is to create a modularized CSS specification. XSL is a style sheet language especially designed for XML documents. It uses XML syntax for style sheets. XSL contains the transformation language XSLT as its component. XSLT can be used also independently of XSL to describe transformations of XML documents.
Canonical XML defines a process to create a specified physical representation, a canonical form, to an XML document or a document subset. The process is called canonicalization. XML Fragment Interchange language includes capabilities to specify a part of a whole XML document as a fragment to be sent to a receiver. XInclude is a language for specifying merging of a set of XML documents represented as Infosets to a new Infoset. XQuery is the W3C query language under development for XML data. It is based on the XML Query Algebra and XML Query Data Model.
Table 4. XML Transducers
Language |
Purpose |
Document, Phase (R, PR, CR, WD), Month, Year |
CSS |
Rendering |
- Cascading Style Sheets,
level 2 CSS2 Specification, R, May 1999 - CSS Mobile Profile
1.0, WD, Oct. 2000 - CSS3 introduction,
WD, April 2000 - User Interface for
CSS3, WD, Feb 2000 - CSS3 module: W3C
selectors, WD, Oct. 2000 - CSS3 module: Ruby,
WD, Feb. 2001 - CSS3 module: Color,
WD, March 2001 - Paged Media Properties
for CSS3, WD, Sept. 1999 - CSS Namespace
Enhancements (Proposal), WD, June 1999 - Color Profiles for
CSS3, WD, June 1999 - Multi-column
layout in CSS, WD, June 1999 |
XSLT |
Transformation |
XSL
Transformations (XSLT) Version 1.0 Specification Errata |
Canonical XML |
Canonicalization |
|
XSL |
Rendering |
Extensible Stylesheet
Language (XSL) Version 1.0, CR, Nov. 2000 |
XML Fragment Interchange |
Interchanging fragments |
|
XInclude |
Merging |
|
XQuery |
Querying |
- XQuery: A Query Language
for XML, WD, Feb. 2001 - XML Query
Requirements, WD, Feb. 2001 - XML Query Data
Model, WD, Feb. 2001 |
[Introduction | XML | XML Accessories | XML Transducers | XML Applications]
Languages developed or under development in W3C, and intended for XML documents on a very spesific application area are listed in Table 5. Four of the languages are described as W3C Recommendations at the moment: SMIL 1.0, RDF, MathML 1.01, and XHTML 1.0. SMIL is a language for integrating a set of independent multimedia objects into a syncronized multimedia presentation. It can be used to describe temporal behaviour, layout of the presentation on the screen, and links between media objects. Work in progress concerns the development of the second version of SMIL to support the reuse of SMIL syntax and semantics in other XML-based languages. RDF is a general model for the metadata describing Web resources. The concrete syntax of the RDF is given by XML and requires also the XML namespace facility. MathML is an XML application for describing mathematical notation. The goal of MathML is to eable encoding mathematical material for the Web. The second version has reached the Recommendation phase. XHMTL is a reformulation of HTML 4 in XML 1.0. The XHMTL specification is associtated with a set of other specifications supporting the modularized use of XHMTL.
RDF Schema, SVG 1.0, XML-Signature, and P3P are languages on the Candidate Recommendation phase. The RDF Schema language allows the use of RDF to describe RDF vocabularies, and especially to provide information about the interpretation of the statements given in an RDF data model. SVG is a language for describing two-dimensional vector and mixed vector/raster graphics in XML. XML-Signature defines syntax and processing rules for XML digital signatures. It is intended to provide integrity, message authentication and signer authentication services for data, be it located within the XML that includes the signature or elsewhere. P3P stands for the Platform for Privacy Preferences and it enables Web sites to express their privacy practices in a standard format.
The rest of the languages of Table 5 are still work in progress. APPEL is a language for describing collections of preferences regarding P3P policies between P3P agents. It is intended to complement the P3P language. SMIL Animation defines an animation framework for XML documents. It is based upon the SMIL 1.0 timing model, with some extensions. XForms is intended for the specification of Web forms that can be used on a variety of platforms, for instance, on desktop computers, television sets, or cell phones. Ruby Annotation is a markup language for ruby, short runs of text alongside the base text, typically used in East Asian documents to indicate pronunciation or to provide a short annotation. CC/PP (Composite Capabilities/Preference Profiles) describes a framework for specifying how client devices express their capabilities and preferences to the server that originates content. To enable the use of speech on the web and the access to the Web using spoken interaction W3C is developing several markup languages. These languages do not have any names yet. Finally, XMLP (XML Protocol) is intended allow two or more peers to communicate in a distributed environment, using XML as its encapsulation language. For XMLP there is only a requirements specification draft available.
An extensive list of XML applications developed or under development also by other organizations than W3C is maintained by Robin Cover.
Table 5. XML Applications
This report has been created as part of the X Group activities at the University of Waterloo in Canada. Preliminary versions were developed in the inSGML project at the University of Jyväskylä in Finland. The work has been supported by the grant 48989 of the Academy of Finland. Please report errors in this document or other comments to mailto:asalminen@db.uwaterloo.ca.
[Introduction | XML | XML Accessories | XML Transducers | XML Applications]