[This local archive copy (text only) mirrored from: http://online.guardian.co.uk/theweb/881176031-xml.html; see the canonical version of the document.]

Kevin Wilson on the language set to make the Web smarter
From tags to riches

The Web is spectacularly successful. It’s also pretty dumb — based on a language that doesn’t understand the content of the pages it displays. This lack of brainpower at the Web’s heart limits its speed and function. Next week, the World Wide Web Consortium — the body that oversees the development of Web technologies — will publish the final draft of XML, a new language that could vastly improve how today’s Web works and open up new possibilities for publishers, corporate users and the average surfer.

Most documents on the Web today are stored and transmitted in HTML (hypertext markup language). This is a simple system that defines the appearance of a Web page by placing its contents within tags such as "title" and "table". HTML also enables a Web document to link to any other on the Net.

HTML is a small subset of SGML (standardised general markup language), a system that was developed for representing text in electronic form so that it could be exchanged and understood between computer systems. SGML is used in industries such as aerospace, telecommunications and the military that process lots of data, but the language is too complex to be used on the Web.

XML (extensible markup language) was created as a halfway house between HTML and SGML — retaining the simplicity of HTML, but adding some of SGML’s more sophisticated functions.

“HTML does something very important,” says Tim Bray, one of the authors of XML. “It makes simple tasks simple . . . But to do anything more ambitious needs extensions — HTML just isn’t going to grow them.”

HTML makes it easy for ordinary users to read and write Web pages, and was largely responsible for the lightning fast growth of the Web. However, the language has two key limitations. First, publishers can’t create their own tags to further define a page’s contents. So, for example, an online travel agent can’t just create HTML tags called "destination" and "business class". With XML this will be possible, so instead of a search engine presenting a would-be traveller with the thousands of sites that contain the word “Zurich”, it can filter this down to find those in which “Zurich” is a destination for a business class flight.

Second, the greater power of XML means that each Web page can be searched as a database, rather than a document. For Web publishers, this means that they can automatically build sites from existing databases of information. For surfers, it means information on Web pages can be presented and manipulated according to their needs — so a phone book sorted by last name could instead be sorted by address.

Bray also predicts that XML could make the Web faster and more powerful to use: “The Web is kind of boring and slow. To make it interesting and fast we need to be able to download some “smarts” (as in Java), and some rich data (as in XML) into the Web browser so it can do some useful work without having to do round trips across the Net for every little transaction.”

Java is a cross-platform Web language that enables data and programs to be sent across the Net. In combination with XML, it could take the strain off Web sites by gathering chunks of smarter information from the Web and downloading them to the user’s PC, where the Java applet does the processing. So you could gather the information you would need to design, say, a kitchen from a Web site and then use the applet to test out different combinations on your PC.

Of course, none of this will happen unless XML is supported by Web browsers. Netscape and Microsoft have both signed up to the XML standard and have said they will support the language in future versions of their browser programs. Charles Vincent, client product manager for Netscape Europe, says XML could be a fresh start for the Web: “With HTML a lot of add-ons were needed. I think XML will integrate a lot of the specific needs of more and more demanding users on the Web.”

XML could also give Web commerce a boost. If companies in the same line of business — estate agents, bankers, booksellers — agree to use the same set of XML tags, their goods could be described more precisely and potential customers would find them more easily.

For now, XML is a simple technology with a lot of promise. “XML isn’t important in itself,” says Bray. “It’s the things that it enables you to build on the Web.” If the standard is ratified, by this time next year we may all wonder how we ever got by with today’s dumb data.

03 December 1997