[This local archive copy mirrored from the canonical site: http://www.sun.com/980602/xml/; links may not have complete integrity, so use the canonical document at this URL if possible.]
But what about all of the HTML data that constitutes the web? Ever since the web itself was The Next Thing, HTML has been the target format for content developers around the world. If you master HTML, you can reach a worldwide audience.
The W3C is undoubtedly rich in ideas for HTML, but the purpose of this workshop was for the W3C to listen to its members and to determine what actions best support its members' needs. What emerged from this workshop is a surprising and, many may agree, a positive program for HTML.
Two workshop participants who represented the ISO (International Standards Organization) brought this question home. In the view of ISO, HTML has constituted a de facto standard of sufficient heft that ISO has given HTML its standards treatment. "ISO HTML," based on HTML 4.0, is described in two key documents:
Together these documents codify a rigorous view of HTML, even if HTML is not always implemented as ISO describes it. ISO has standardized HTML in the conviction that HTML will persist for at least 25 years. Given ISO's long view of the situation, HTML's future has at least one substantial vote of confidence. Moreover, having made HTML into a standard, ISO expects the W3C to remain responsible for HTML.But ISO is a conservative organization that records existing standards; it stands down from the task of driving innovation. How does ISO's or any other long view of HTML square with the innovative force of XML?
Other industry consortia have sponsored XML-based languages better suited to the needs of their information. Mathematicians have developed MathML, and chemists have advanced CML (Chemical Markup Language). Both of these have used XML to define their content models. These are only two; there are also many others in the works.
Manufacturers of cellular phones, PDAs, or smaller information devices have taken a different approach. They have championed Compact HTML in an effort to pare from HTML features more appropriate for large-screen user agents such as browsers. While framed in terms of HTML today, this effort could easily turn to XML for a content model more closely suited to the information and devices that mobile HTML is meant to serve.
But HTML is a unitary standard that requires a W3C-convened working group to maintain and advance it. The process is not glacial, but neither is it instantaneous. Unlike HTML, XML enables users to develop the content models appropriate to their applications much more quickly. What is the motivation to stand by HTML when so much of the world is looking to XML to solve its problems?
Is XML an irresistible force and HTML an immovable object? How can the stability of HTML accommodate the fast-appearing XML-based tag sets like CML or MathML? The consortia that develop their own XML-based information models also complain that HTML is already too big and complex.
It would seem that HTML and XML are on conflicting courses. But do they need to be?
User agents that normally process HTML data would have to swap in an XML processor to render that "island" of information between the <xml> and </xml> tags.<HTML> <body> <!-- some typical HTML document with <h1>, <h2>, <p>, etc. --> <xml> <!-- The <xml> tag introduces some XML-compliant markup for some specific purpose. The markup is then explicitly terminated with the </xml> tag. The user agent would invoke an XML processor only on the data contained in the <xml></xml> pair. Otherwise the user agent would process the containing document as an HTML document. --> </xml> <!-- more typical HTML document markup --> </body> </html>
Another proposal that met with more skepticism is the idea of "sprinkling" XML data within an HTML document. This idea has been tossed off in the popular press without considering the fuller implications, and many people consider it more problematic than practical, but for markup specialists, this is what XML "sprinkles" might look like:
But the controlled embedding of XML objects inside an HTML document suggests a practical means of mixing the supposedly immiscible HTML and XML.<HTML> <body> <p>One would sprinkle some XML in a document to indicate that <part-number>805-5412</part-number> requires special treatment because it is a part number. <p>Processing would be less straightforward than for XML islands. <!-- We at Sun contend that for these sprinkles of XML, a different mechanism, already in HTML 4.0, is more appropriate: <span class="part-number">805-5412</span> accomplishes a similar effect and does not create processing challenges. --> </body> </html>
To support HTML in applications like XML browsers, a tool to convert today's amorphous, non-rigorous HTML documents into well formed XML documents is required. The W3C is working on such a tool right now; watch for details about it in the future.
The goal would be HTML as a robust, well known data format for documents on web, but with the benefits of extensibility, processability, and manageability that elude HTML documents today.
(c) Sun Microsystems