Catching up from a long vacation always gives me a new appreciation for information management. My filtering skills get rusty after three weeks on the beach, making the undifferentiated mass of E-mail, snail mail and voice mail seem that much more formidable.
The more structured the information, however, the easier it is to filter. I cut down my E-mail backlog, for example, from around 500 messages to just under 70 true "action items" in half a day by sorting the mail based on sender, date, and whether the mail was addressed directly to me or to a list. Even voice mail has a structure of sorts imposed by the voice-mail system, allowing me to peruse the messages in reverse order (newest first).
The paper mail, on the other hand, and the backlog of periodicals are the hardest to catch up on. By the end of the week, I'll probably start throwing out stacks of them without even removing the rubber bands. If it's important, I'll get it again.
Most Web sites are as unstructured as those stacks of snail mail, making filtering, extraction of data and navigation harder than it should be. XML (Extensible Markup Language) is an attempt to give Web authors the tools to add structure. XML lets publishers create extensions to HTML, or create whole new markup languages, that describe the data more completely and in a way that can be validated and imported programmatically (see story).
It's happening already, on a small scale. Microsoft's CDF (Channel Definition Format) is one of the first languages created using XML, providing a treelike subject hierarchy for navigating the content on a site. More complicated XML development efforts are on the drawing board, such as one using XML for EDI. New tools also are starting to show up.
Using XML-derived languages and formats is a no-brainer, but using XML to create more structure for your own site is more complicated. The easiest way is to use an existing XML-derived language like CDF, but many sites will need much more. Business-to-business data exchange, such as enabling your partners to import your price lists directly, will likely require custom tags or even a new markup language specific to your business or industry.
It's not too early to start thinking about what your markup language should look like. What information is your audience really after? If your site is generated partly from a database, you have a good start toward what's called a DTD (Document Type Definition), the description of an XML language, in that database's schema.
What makes XML so useful is that you don't have to dumb down the information in that database to squeeze it through limited HTML tags. You can create an XML DTD that describes each field in your database, allowing more direct communication of that information using your Web server.
Creating that DTD is not easy, though. Although XML is vastly simpler than Standard Generalized Markup Language (the working draft is only 33 pages, compared with 500 for a full description of SGML), the draft is mind-numbingly dull and convoluted. Parsers and even XML viewers are showing up, but what we really need is easier tools for creating DTDs.
But at the risk of gushing (as Jeff Frentzen pointed out in his column last week on XML), I don't think it's an exaggeration to say that XML will revolutionize the Web. It's robust enough to convey almost any document without the drastic loss of features and information that happens with a conversion to HTML.
And anything that will help us make more sense out of the mountain of information on the Web is a good thing.
Are you starting to use XML now? Tell me about it at esullivan@zd.com.