[This local archive copy mirrored from the canonical site: http://www.sun.com/980602/xml/; links may not have complete integrity, so use the canonical document at this URL if possible.]
XML: It's the Future of HTML
Those who watch the XML (Extensible Markup
Language) phenomenon have noted its rapid and impressive advances.
As the first and second
articles in this series have attempted to convey, XML is gaining what
the marketers call "mindshare" at an amazing rate. Web futurists are satisfied
that XML is The Next Thing.
But what about all of the HTML data that constitutes the web? Ever since
the web itself was The Next Thing, HTML has been the target format for
content developers around the world. If you master HTML, you can reach
a worldwide audience.
The Demise of HTML?
Many arguments in support of XML have taken strength from a critical appraisal
of HTML. HTML's limitations have fueled the call for and interest in a
technology like XML. After all:
XML will change all of this, to be sure. But no one can realistically expect
the volumes of active, useful HTML pages to become irrelevant overnight.
In fact, HTML has an important role to play in the brave new world of XML.
But what is that role?
HTML is a fixed tag set. It only describes documents of a single type.
HTML data is hard to process. Browsers have permitted all manner of HTML
messiness to pass, unchecked, into semi-permanent residency in cyberspace.
HTML documents that aspire to function like applications are clogging the
internet with client-to-server traffic.
W3C to the Rescue
To answer the question of whither HTML, the W3C
(World Wide Web Consortium) recently convened a workshop
about HTML's future. As the organization that has maintained and furthered
the HTML standard since the IETF released the HTML 1.0 specification, the
W3C has an abiding interest in HTML, as have many of the W3C's members.
On May 4 and 5, many of those members and some unaffiliated but interested
parties attended the W3C's "Future of HTML" workshop near San Francisco.
The W3C is undoubtedly rich in ideas for HTML, but the purpose of this
workshop was for the W3C to listen to its members and to determine what
actions best support its members' needs. What emerged from this workshop
is a surprising and, many may agree, a positive program for HTML.
Does HTML Have a Future?
HTML certainly has its past and current relevance. But the enthusiastic
acceptance of XML and the fact that W3C dissolved its HTML working group
after publishing the HTML 4.0 recommendation may have left HTML's status
as an ongoing and meaningful data format in question.
Two workshop participants who represented the ISO (International Standards
Organization) brought this question home. In the view of ISO, HTML has
constituted a de facto standard of sufficient heft that ISO has given HTML
its standards treatment. "ISO HTML," based on HTML 4.0, is described in
two key documents:
Together these documents codify a rigorous view of HTML, even if HTML is
not always implemented as ISO describes it. ISO has standardized HTML in
the conviction that HTML will persist for at least 25 years. Given ISO's
long view of the situation, HTML's future has at least one substantial
vote of confidence. Moreover, having made HTML into a standard, ISO expects
the W3C to remain responsible for HTML.
But ISO is a conservative organization that records existing standards;
it stands down from the task of driving innovation. How does ISO's or any
other long view of HTML square with the innovative force of XML?
HTML or XML?
Many enterprises and consortia have been wrestling with HTML's perceived
limitations, and those groups look to XML as a means for escaping them.
C|NET's representative called for HTML,
augmented with CSS (Cascading Style Sheets) for style and layout, to persist
only as a machine-generated output format for web documents. In this view,
documents would be authored in some other format, perhaps XML, SGML, or
some proprietary format more tractable than HTML. C|NET's view is reflected in
the practices of many companies that publish technical documentation on
the web (such as Sun's own documentation
Other industry consortia have sponsored XML-based languages better suited
to the needs of their information. Mathematicians have developed MathML,
and chemists have advanced CML
(Chemical Markup Language). Both of these have used XML to define their
content models. These are only two; there are also many others in the works.
Manufacturers of cellular phones, PDAs, or smaller information devices
have taken a different approach. They have championed Compact
HTML in an effort to pare from HTML features more appropriate for large-screen
user agents such as browsers. While framed in terms of HTML today, this
effort could easily turn to XML for a content model more closely suited
to the information and devices that mobile HTML is meant to serve.
But HTML is a unitary standard that requires a W3C-convened
working group to maintain and advance it. The process is not glacial, but
neither is it instantaneous. Unlike HTML, XML enables users to develop
the content models appropriate to their applications much more quickly.
What is the motivation to stand by HTML when so much of the world is looking
to XML to solve its problems?
HTML and XML?
Some powerful motivations to preserve HTML exist, despite XML's appeal.
Beyond the obvious motivation, that is, support for multiple millions of
active web pages, HTML is well understood as a format for authoring. The
HTML Writers Guild's representatives were not the only workshop participants
to make this point. With all its problems, HTML is nonetheless a highly
successful lingua franca for expressing ideas on the web. HTML has
given rise far too powerful a communications medium to cede quickly and
gracefully to something else.
Is XML an irresistible force and HTML an immovable object? How can the
stability of HTML accommodate the fast-appearing XML-based tag sets like
CML or MathML? The consortia that develop their own XML-based information
models also complain that HTML is already too big and complex.
It would seem that HTML and XML are on conflicting courses. But do they
need to be?
XML in HTML?
One serious proposal is for HTML documents to support the inclusion and
processing of XML data. This would allow an author to embed within a standard
HTML document some well delimited, well defined XML object. The HTML document
would then be able to support some functions based on the special XML markup.
This strategy of permitting "islands" of XML data inside an HTML document
would serve at least two purposes:
The result (for markup mavens) would look like this:
To enrich the content delivered to the web and support further enhancements
to the XML-based content models
To enable content developers to rely on the proven and known capabilities
of HTML while they experiment with XML in their environments.
User agents that normally process HTML data would have to swap in an XML
processor to render that "island" of information between the <xml>
and </xml> tags.
<!-- some typical HTML document with
<h1>, <h2>, <p>, etc. -->
<!-- The <xml> tag introduces some XML-compliant
markup for some specific purpose. The markup is
then explicitly terminated with the </xml> tag.
The user agent would invoke an XML processor
only on the data contained in the <xml></xml>
pair. Otherwise the user agent would process
the containing document as an HTML document. -->
<!-- more typical HTML document markup -->
Another proposal that met with more skepticism is the idea of "sprinkling"
XML data within an HTML document. This idea has been tossed off in the
popular press without considering the fuller implications, and many people
consider it more problematic than practical, but for markup specialists,
this is what XML "sprinkles" might look like:
But the controlled embedding of XML objects inside an HTML document suggests
a practical means of mixing the supposedly immiscible HTML and XML.
<p>One would sprinkle some XML in a
document to indicate that
requires special treatment because
it is a part number.
<p>Processing would be less straightforward
than for XML islands.
<!-- We at Sun contend that for these
sprinkles of XML, a different mechanism,
already in HTML 4.0, is more appropriate:
accomplishes a similar effect and does not
create processing challenges. -->
HTML as XML?
Another proposal more appropriate for long-term implementation (as opposed
to the XML "islands" in HTML) is to re-do HTML as an XML application. That
is, rewrite the HTML specification so that HTML documents must, like XML,
be well formed and may optionally be valid. The reasons that HTML documents
are not well formed today are technically dense and need not be elaborated
here; they are a function of HTML's history. However, the consensus of
W3C members at the "Future of HTML" workship strongly favored this option.
To support HTML in applications like XML browsers, a tool to convert
today's amorphous, non-rigorous HTML documents into well formed XML documents
is required. The W3C is working on such a tool right now; watch for details
about it in the future.
W3C at the Ready
So where does that lead? It leads to the workshop participants' support
of a resolution that the W3C reconvene an HTML working group. The working
group's scope would include, among others, objectives suggested throughout
XML objects in traditional HTML documents
HTML as an application of the XML standard
HTML as a modular rather than unitary content model
The goal would be HTML as a robust, well known data format for documents
on web, but with the benefits of extensibility, processability, and manageability
that elude HTML documents today.
(c) Sun Microsystems