The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Created: January 22, 2008.
News: Cover StoriesPrevious News ItemNext News Item

Public Draft for HTML 5: A Vocabulary and Associated APIs for HTML and XHTML.


The World Wide Web Consortium (W3C) has announced the publication of a First Public Working Draft of HTML 5: A Vocabulary and Associated APIs for HTML and XHTML. The specification is intended to replace, viz., become the new version of, what was previously defined in the HTML4, XHTML 1.x, and DOM2 HTML specifications.

The HTML 5 specification defines the fifth major revision of the core language of the World Wide Web: HTML. In this version: (1) new features are introduced to help Web application authors, (2) new elements are introduced based on research into prevailing authoring practices, and (3) special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability. The new features are presented in the companion Working Draft HTML 5 Differences from HTML 4. The specification attempts to fulfill goals and principles articulated in the HTML Design Principles Working Draft.

According to the W3C announcement, the HTML 5 specification "helps to improve interoperability and reduce software costs by giving precise rules not only about how to handle all correct HTML documents but also how to recover from errors. Ajax and related innovations have propelled demands for a new standard that allows people to create Web applications that interoperate across desktop and mobile platforms. Some of the most interesting new features for authors are APIs for drawing two-dimensional graphics, embedding and controlling audio and video content, maintaining persistent client-side data storage, and for enabling users to edit documents and parts of documents interactively."

The new specification differs from previous versions of "HTML" in that it defines an abstract language for describing documents and applications, as well as some APIs for interacting with in-memory representations of resources that use this language. The in-memory representation is known as "DOM5 HTML", or "the DOM" for short.

Various concrete syntaxes could be used to transmit resources that use the abstract language; two concrete syntaxes are defined in the 2008-01-22 Working Draft specification. The first such concrete syntax is "HTML5". This is the format recommended for most authors, and is compatible with all legacy Web browsers. If a document is transmitted with the MIME type text/html, then it will be processed as an "HTML5" document by Web browsers. The second concrete syntax uses XML, and is known as "XHTML5". When a document is transmitted with an XML MIME type such as application/xhtml+xml, then it is processed by an XML processor by Web browsers, and treated as an "XHTML5" document.

The specification section on Scope addresses the relationship of HTML 5 to XUL, Flash, Silverlight, and other proprietary UI languages and to CPU-intensive high-end workstations and their associated computing environments:

"This specification is independent of the various proprietary UI languages that various vendors provide. As an open, vender-neutral language, HTML provides for a solution to the same problems without the risk of vendor lock-in.

For sophisticated cross-platform applications, there already exist several proprietary solutions (such as Mozilla's XUL, Adobe's Flash, or Microsoft's Silverlight). These solutions are evolving faster than any standards process could follow, and the requirements are evolving even faster. These systems are also significantly more complicated to specify, and are orders of magnitude more difficult to achieve interoperability with, than the solutions described in this document. Platform-specific solutions for such sophisticated applications (for example the MacOS X Core APIs) are even further ahead.

The scope of this specification is not to describe an entire operating system. In particular, hardware configuration software, image manipulation tools, and applications that users would be expected to use with high-end workstations on a daily basis are out of scope.

In terms of applications, this specification is targeted specifically at applications that would be expected to be used by users on an occasional basis, or regularly but from disparate locations, with low CPU requirements. For instance: online purchasing systems, searching systems, games (especially multiplayer online games), public telephone books or address books, communications software (e-mail clients, instant messaging clients, discussion software), document editing software, etc."

The specification's Scope section also clarifies the relationship of XHTML 5 to HTML 4.01, XHTML 1.1, DOM2 HTML, Web Forms 2.0, and XHTML2.

XHTML 5 "represents a new version of HTML4 and XHTML1, along with a new version of the associated DOM2 HTML API. Migration from HTML4 or XHTML 1.1 to the format and APIs described in this specification should in most cases be straightforward, as care has been taken to ensure that backwards-compatibility is retained. The specification will eventually supplant Web Forms 2.0 as well.

XHTML2 defines a new HTML vocabulary with better features for hyperlinks, multimedia content, annotating document edits, rich metadata, declarative interactive forms, and describing the semantics of human literary works such as poems and scientific papers. However, it lacks elements to express the semantics of many of the non-document types of content often seen on the Web... XHTML2 and this specification use different namespaces and therefore can both be implemented in the same XML processor.

The specification's Section 1.3 on "Conformance Requirements" clarifies that both document structure and processing model are normatively defined. It defines detailed processing models to foster interoperable implementations.

"This specification describes the conformance criteria for user agents (relevant to implementors) and documents (relevant to authors and authoring tool implementors).

There is no implied relationship between document conformance requirements and implementation conformance requirements. User agents are not free to handle non-conformant documents as they please; the processing model described in this specification applies to implementations regardless of the conformity of the input documents.

User agents fall into several (overlapping) categories with different conformance requirements. [For example]:

  • Web browsers and other interactive user agents: Web browsers that support XHTML must process elements and attributes from the HTML namespace found in XML documents...
  • Non-interactive presentation user agents: User agents that process HTML and XHTML documents purely to render non-interactive versions of them must comply to the same conformance criteria as Web browsers, except that they are exempt from requirements regarding user interaction...
  • User agents with no scripting support: Implementations that do not support scripting (or which have their scripting features disabled) are exempt from supporting the events and DOM interfaces mentioned in the specification.
  • Conformance checkers: Conformance checkers must verify that a document conforms to the applicable conformance criteria...
  • Data mining tools: Applications and tools that process HTML and XHTML documents for reasons other than to either render the documents or check them for conformance should act in accordance to the semantics of the documents that they process.
  • Authoring tools and markup generators: Authoring tools and markup generators must generate conforming documents; conformance criteria that apply to authors also apply to authoring tools, where appropriate...

The 2008-01-22 Working Draft for XHTML 5 defines dependencies upon four underlying specifications. It does not require support of any particular network transport protocols, style sheet language, scripting language, or any of the DOM and WebAPI specifications beyond those described above. However, the language described by this specification is biased towards CSS as the styling language, ECMAScript as the scripting language, and HTTP as the network protocol, and several features assume that those languages and protocols are in use. The four specifications are:

  • XML: Implementations that support XHTML5 must support some version of XML, as well as its corresponding namespaces specification, because XHTML5 uses an XML serialisation with namespaces.
  • XML Base: User agents must follow the rules given by XML Base to resolve relative URIs in HTML and XHTML fragments. That is the mechanism used in this specification for resolving relative URIs in DOM trees.
  • DOM: Implementations must support some version of DOM Core and DOM Events, because this specification is defined in terms of the DOM, and some of the features are defined as extensions to the DOM Core interfaces.
  • ECMAScript: Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings for DOM Specifications specification, as this specification uses that specification's terminology.

The HTML 5 specification will not be considered finished before there are at least two complete implementations of the specification. This is a different approach than previous versions of HTML had. The goal is to ensure that the specification is implementable and usable by designers and developers once it is finished.

The editors of the Working Draft provide instructions on how to read the specification: "This specification should be read like all other specifications. First, it should be read cover-to-cover, multiple times. Then, it should be read backwards at least once. Then it should be read by picking random sections from the contents list and following all the cross-references."

Publication note: The HTML 5 [Editor's Draft] specification is also being produced by the WHATWG. The two specifications are identical from the Table of Contents onwards. The W3C HTML Working Group is the W3C working group responsible for this specification's progress along the W3C Recommendation track.

Bibliographic Information

HTML 5: A Vocabulary and Associated APIs for HTML and XHTML". W3C Working Draft. 22-January-2008. Edited by Ian Hickson (Google, Inc) and David Hyatt (Apple, Inc). This Version URI: Latest Published Version URI: Latest Editor's Draft URI: See also the PDF format for the Editor's Draft (522 pages) [cache].

HTML 5 Differences from HTML 4. W3C Working Draft. 22-January-2008. Edited by Anne van Kesteren (Opera Software ASA). This Version URI: Latest Version URI:

See also: HTML Design Principles. W3C Working Draft. 26-November-2007 (First Public Working Draft) or later. Edited by Anne van Kesteren (Opera Software ASA) and Maciej Stachowiak (Apple Inc) This Version URI: Latest Version URI: The HTML 5 primary design principles (from the Table of Contents):

  • Compatibility
    • Support Existing Content
    • Degrade Gracefully
    • Do not Reinvent the Wheel
    • Pave the Cowpaths
    • Evolution Not Revolution
  • Utility
    • Solve Real Problems
    • Priority of Constituencies
    • Secure By Design
    • Separation of Concerns
    • DOM Consistency
  • Interoperability
    • Well-defined Behavior
    • Avoid Needless Complexity
    • Handle Errors
  • Universal Access
    • Media Independence
    • Support World Languages
    • Accessibility

Editor's version: The latest stable version of the editor's copy of the HTML 5 specification is always available on the W3C CVS server and in the WHATWG Subversion repository. The latest editor's draft (which may contain unfinished text in the process of being prepared) is available on the WHATWG site. Detailed change history can be obtained from the following locations:

From the W3C Announcement

Excerpts from the W3C announcement 2008-01-22: "W3C Publishes HTML 5 Draft, Future of Web Content. Web Community Forges Next HTML Standard in Public W3C Forum."

W3C today published an early draft of HTML 5, a major revision of the markup language for the Web. The HTML Working Group is creating HTML 5 to be the open, royalty-free specification for rich Web content and Web applications. The group operates entirely in public with nearly five hundred participants, including representatives from W3C Members ACCESS, AOL, Apple, Google, IBM, Microsoft, Mozilla, Nokia, and Opera.

"HTML is of course a very important standard," said Tim Berners-Lee, author of the first version of HTML and W3C Director. "I am glad to see that the community of developers, including browser vendors, is working together to create the best possible path for the Web. To integrate the input of so many people is hard work, as is the challenge of balancing stability with innovation, pragmatism with idealism."

Why the Community Wants HTML 5

Engineers, designers, marketing departments, and users have learned much about the Web as a medium since HTML 4 was first published in December 1997. Web sites reflect this progress: no longer static page collections, they are now media-rich communities that leverage participation and evolve dynamically to better meet customer needs. Ajax and related innovations have propelled demands for a new standard that allows people to create Web applications that interoperate across desktop and mobile platforms.

W3C launched the HTML Working Group in March 2007 as a forum for building consensus around the new standard. The group has already published a set of HTML design principles, which include: ensuring support for existing content, codifying widespread practice, separating concerns (markup from presentation), and enabling universal access. These principles help guide the group's decision-making.

What's New in HTML 5

Some of the most interesting new features for authors are APIs for drawing two-dimensional graphics, embedding and controlling audio and video content, maintaining persistent client-side data storage, and for enabling users to edit documents and parts of documents interactively. Other features make it easier to represent familiar page elements, including <section> <footer>; <nav> (for navigation), and <figure> (for assigning a caption to a photo or other embedded content). Authors write HTML 5 using either a "classic" HTML syntax or an XML syntax, according to application demands. See a list of changes from HTML 4.

The HTML 5 specification helps to improve interoperability and reduce software costs by giving precise rules not only about how to handle all correct HTML documents but also how to recover from errors. This is the first version of HTML developed under W3C's Royalty-Free Patent Policy.

In addition to the browser makers listed above, the following W3C Members are helping to shape the HTML 5 specification: BEA Systems, Inc.; Betfair Limited; Boeing; Cisco; Disruptive Innovations; Dreamlab Technologies AG; France Telecom; Hewlett-Packard; IWA-HWG; Mitsue-Links Co., Ltd.; mTLD Top Level Domain Limited; Openwave Systems Inc.; Oxford Brookes University; PicoForms; Queensland University of Technology; Stanford University; University of Innsbruck; and the U.S. Library of Congress.

W3C welcomes feedback from the public on this First Public Working Draft; see the specification for guidance on sending comments. W3C urges more authoring tool developers to take this opportunity to join the HTML Working Group to ensure that HTML 5 meets the needs of their customers. W3C also encourages people to let software makers know which features of HTML 5 they most value.

About the World Wide Web Consortium (W3C)

The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards. W3C primarily pursues its mission through the creation of Web standards and guidelines designed to ensure long-term growth for the Web. Over 400 organizations are Members of the Consortium. W3C is jointly run by the MIT Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) in the USA, the European Research Consortium for Informatics and Mathematics (ERCIM) headquartered in France and Keio University in Japan, and has additional Offices worldwide. For more information see

Annoucement also available in French and Japanese. See also translations in other languages, and the W3C Press Release Archive.

Principal References

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: