In last week's column, we looked at how the ancient history of electronic publishing - SGML - is making a resurgence in the form of the extensible markup language initiative (XML). The idea is to take the power of SGML (standard generalized markup language) and make it available to the rest of us by blending it with the simplicity and universality of HTML.
And who better to introduce these ideas than Tim Bray, co-editor of the XML specification? Lately, Tim has been consulting with Netscape and is representing the company in the XML working group at the World Wide Web Consortium. Tim is undeniably excited about the possibilities XML will make available to both the Web and the networked world in general.
This week, we talk about some of the underlying workings of XML and take a look at some practical applications.
JEFF: Could you talk about how XML works in the context of HTML?
TIM: XML uses tags, just like HTML does. So in HTML you can say:
<p>This is a picture of Thomas Pynchon: <img src="tp.jpg"></p>
Notice that the <p> tag has a corresponding end-tag, </p>; these mark the start and end of the paragraph. There is no end-tag for the <img> because it's "empty." Since a Web browser knows this, it won't get confused and look for an </img>. The XML version would look a little different:
<p>This is a picture of Thomas Pynchon: <img src="tp.jpg"/></p>
The <img tag ends with "/>" - this is to make it crystal clear that it's empty, so no computer program will ever go looking for the end tag. That's the whole idea of XML - the computer program can read it properly and figure out what goes where without any special built-in knowledge about the tags. This means you can invent your own tags. For example:
<p>This is a picture of
<author><firstname>Thomas</firstname> <surname>Pynchon</surname></author>: <img src="tp.jpg"/></p>
A Web browser won't know what to do with the author, firstname, and surname tags. But a search engine might. So might a Java applet or an ActiveX control or a subject-classification robot. This is the "X" (for eXtensible) in XML at work.
There is some other machinery in XML so that you can declare the tags that you're going to use and provide a grammar for how all the tags fit together. Or you can just make them up as you go along.
JEFF: So XML isn't an alternative to HTML....
TIM: HTML remains a wonderful vehicle for delivering stuff to screens quickly and with minimal effort. The portability is pretty good, the formatting is adequate these days, the hypertext is usable. Trouble is, it's hard to improve, because the tags are trying to do formatting and interaction and hypertext all at once. The idea in XML is that the tags just say what the data is - you want formatting, use a stylesheet; you want behavior, use the document object model, or glue methods to tags some other way. That way you've decoupled the domains, and you can really make solid improvements in one without compromising the other.
The basic idea is that XML is extensible, simple, and isn't trying to do too much. Thus, any old geek can write a parser in a few days and ship it. I know, because I did; it's in Java. Pick it up at http://www.textuality.com/Lark/.
JEFF: How about a few real-world examples of future XML applications? How would an intranet content manager use it? Or a search-engine developer? Or even me doing my homepage?
TIM: Speaking as a former search-engine guy, I think that XML will give all the search engines a big shot in the arm. Full-text search of page contents is a pretty blunt instrument, but if the pages have been enriched with <author> and <subject> and <expiry-date> and lots of other wonderful added information, things will suddenly be a lot easier to find.
Another XML application that's starting to work right now is OFX - this stands for open financial exchange, and is the format that Quicken and Microsoft Money and so on use to exchange information with banks. Making this a textual format buys quite a bit of simplicity and ease of interchange, but it needs to have a bunch of financial fields that won't ever be in HTML, and it needs to be lightweight and easy to process. XML is a no-brainer for this one.
As for you and your homepage, you're kind of stuck until the big browsers support XML natively. Nobody knows when that will be, but I'd be surprised if we didn't see some beta code by year's end. Then the idea is that you will do your pages in XML, and do separate stylesheets for them. You might have one stylesheet that is designed for normal display, another one for tiny screens like cell phones and Palm Pilots, another for people who browse with graphics off, another for printing your page, and so on.
My favorite app, of course, is MCF, Meta Content Framework, originally dreamed up by R. V. Guha and some other Apple people; Guha has recently moved to Netscape, and MCF was recently simplified and submitted to W3C by Guha and me on behalf of Netscape. MCF is a general-purpose framework for storing metadata - information about information. This is what search engines and Web mappers and channel wranglers and so on really need.
I think XML plus the DOM gets real interesting. I think XML-for-metadata (i.e., MCF) is very interesting. These things are all happening too fast....
Jeffrey Veen writes a weekly column on tools and related Web technologies for Webmonkey.