[NOTE: This local archive copy of "XML Activity" was mirrored as a snapshot from the official and canonical URL, http://www.w3.org/XML/Activity, 1999-01-06; please refer to the canonical source document if possible. A few anchors have been placed in this version to facilitate linking to relevant subsections. 1999-01-29.]
Work on XML is being managed as part of W3C's Architecture Domain.
Activity statements provide a managerial overview of W3C's work in this area. They are designed to be read from beginning to end, to be informative and interesting. The role of W3C is given, also the benefits to the Web community, accomplishments to date and a summary of what the future holds.
For brevity, in this Activity Statement, we often refer to the Extensible Markup Language as simply "XML".
XML -- the Extensible Markup Language -- is a simple, very flexible text format based on SGML (ISO 8879). Designed to meet the challenges of large-scale electronic publishing, XMLTM will also play an increasingly important role in the exchange of a wide variety of data on the Web.
XML will
The best way to appreciate what XML looks like is with a simple example. Imagine your company sells products on-line. Marketing descriptions of the products are written in HTML, but names and addresses of customers, and also prices and discounts are formatted with XML. Here is the information describing a customer:
<customer-details id="AcPharm39156"> <name>Acme Pharmaceuticals Co.</name> <address country="US"> <street>7301 Smokey Boulevard</street> <city>Smallville</city> <state>Indiana</state> <postal>94571</postal> </address> </customer-details>
The XML syntax uses matching start and end tags, such as <name> and </name>, to mark up information. A piece of information marked by the presence of tags is called an element; elements may be further enriched by attaching name-value pairs (for example, country="US" in the example above) called attributes. Its simple syntax is easy to process by machine, and has the attraction of remaining understandable to humans. XML is based on SGML, and is familiar in look and feel to those accustomed to HTML.
XML is a low-level syntax for representing structured data. You can use this simple syntax to support a wide variety of applications. This idea is put across in a simplistic way in the diagram below, which shows how XML now underpins a number of Web mark-up languages and applications.
W3C are redeveloping HTML as a suite of XML tag sets so that, although documents will still be marked up using HTML, this will conform to the rules of XML. In this environment mathematical expressions can be inserted into documents using MathML, a formatting language written in XML and developed by W3C's Math working group. Presumably, other domain-specific XML-based tag sets will become candidates for inclusion in HTML documents.
W3C's Metadata Activity is developing the Resource Description Format (RDF). This uses a simple data model expressed in XML syntax as the basis for a language for representing properties of Web resources such as images, documents and the relationships held between them. The Platform for Internet Content Selection (PICS), is being recast in RDF. The PICS framework provides a means for attaching labels to material (in particular, to indicate whether it is suitable for children).
Finally, the Synchronized Multimedia Integration Language - (SMIL) is an XML application consisting of a declarative language for scheduling multimedia presentations on the Web.
Outside of W3C, many groups are already defining new formats for information interchange. The number of XML applications appears likely to grow rapidly. There are many areas, for example, the health-care industry, the Inland Revenue, government and finance, where XML applications may be soon be used to store and process data. XML as a simple method for data representation and organization will mean that problems of data incompatibility and tedious manual re-keying will, by and large, be solved.
The flexibility of XML makes it ideal for interchange of structured data for further processing on the receiving machine. The W3C Document Object Model Activity will provide an interoperable set of classes and methods to manipulate XML documents (as well as HTML documents) from programming languages such as Java, ECMAScript, VBScript, and C++.
XML markup can be used to stucture data to support automatic processing. How is software to recognize the markup it exists to process, and avoid confusing it with markup designed for the use of some other software? For example, one application might use an element called address to label the mailing address of a person; in another application, address might instead be used for a network address. How would a machine or even a person looking at the XML markup know which use of "address" is intended in a given instance?
What is needed is a method for identifying the conventions governing the use of particular sets of elements. The idea is to use a Web address as a globally unique name for such a set of conventions. W3C's work on namespaces is concerned with the elaboration of this idea.
W3C's XML 1.0 Recommendation was issued on February 10, 1998. In response to the increasing popularity of XML as a basis for Web applications, the Activity has organized itself into the following groups:
The membership of this group is the chairs of the individual Working Groups. Its role is to provide a forum for coordination between the Working Groups of the XML Activity, and between the XML Activity and other parts of W3C, and between the XML Activity and other organizations. In particular, the co-ordination group:
The chair of the XML Coordination Group is Jon Bosak of Sun Microsystems.
While XML 1.0 supplies a mechanism, the Document Type Definition (DTD) for declaring constraints on the use of markup, automated processing of XML documents requires more rigorous and comprehensive facilities in this area. Requirements are for constraints on how the component parts of an application fit together, the document structure, attributes, data-typing, and so on. The XML Schema Working Group is addressing means for defining the structure, content and semantics of XML documents.
The co-chairs of the Schema WG are Dave Hollander of Hewlett-Packard and C. M. Sperberg-McQueen, of the University of Illinois at Chicago and the W3C.
The XML Linking Working Group is designing hypertext links for XML. Engineers defining the way that links are to be written in XML have made a distinction for links between objects - "external" links, and "internal" links to locations within XML documents, and both types will receive detailed treatment by this group. The objective of the XML Linking Working Group is to design advanced, scalable, and maintainable hyperlinking and addressing functionality for XML
The working drafts XML Linking Language (XLink) and XML Pointer Language (XPointer) represent the basis on which the work of the Linking WG will proceed.
The chair of the Linking WG is Bill Smith, of Sun Microsystems.
[Note: Mail Archives for comments on XML Linking, viz., www-xml-linking-comments@w3.org.]
The XML 1.0 Recommendation describes the physical representation of XML documents: the use of brackets, character strings and other "nuts and bolts" which make up the language. The Information Set Working Group is looking at more abstract descriptions of XML documents in terms of document tree structures, elements, their attribute lists and so on. The idea is to provide a common reference set that other specifications can use and extend to construct their underlying data models, thus helping to ensure interoperability among the various XML-based specifications and among XML software tools in general.
The chair of the Information Set WG is David Megginson, invited expert.
The XML standard supports logical documents composed of possibly several entities. It may be desirable to view, edit, or interchange one or more of the entities or parts of entities without interchanging the entire document. The problem, then, is how to provide to a recipient of such a fragment the appropriate information about the context that fragment had in the larger document.
The goal of the Fragment Working Group is to define a way to send fragments of an XML document without having to send all or part of the parent document as well. The delivered fragments can either be viewed or edited immediately or accumulated for later use, assembly, or other processing.
The chair of the XML Fragment WG is Paul Grosso of ArborText.
[Note: Mail archives for the XML Fragments List, www-xml-fragment-comments@w3.org.]
The XML Syntax Working Group is concerned with several aspects of XML:
The co-chairs of the XML Syntax WG are Tim Bray, invited expert, and Joel Nava of Adobe.
The Schema Working group plans to deliver Requirements, Working Drafts, and Proposed Recommendations on data typing and schema language in 1999.
The Fragment Working Group expects to issue a W3C Proposed Recommendation for Fragment Interchange by Summer 1999
The Syntax Group plans to deliver:
all during the middle of next year (1999).
The Information Set Working Group plans to have completed the First public XML Information Set Working Draft to be released by the end of 1998 and a Proposed Recommendation by Spring 1999.
Dan Connolly, XML Activity Lead