[This local archive copy mirrored from the canonical site: http://www.westworldproductions.com/archive/1997/1097ctr/5429.htm; links may not have complete integrity, so use the canonical document at this URL if possible.]

October '97 Article

XML: The New Wowser For Browsers!

By Dave Trowbridge, marketing manager, Hummingbird Communications

Organic metaphors for the World Wide Web are seductive: they seem to clarify the boiling confusion of the Internet. And the Web indeed shares a salient characteristic with organic life: they are both phenomena that thrive only in the narrow, ever-changing borderlands between order and chaos.

In our bodies we may experience excessive order as, for instance, cancer, a deadly monotony of identical cells; and chaos may strike as deadly cardiac arrhythmia, where various parts of the heart lose the ability to communicate and synchronize. On the Web, standards imposed by technological inertia, monopoly, or government impose deadly order, choking innovation and growth; yet, in the absence of standards, little useful communication takes place and the Internet remains nothing more than islands of isolated data and computing power.

Nowhere is this dilemma more poignant than in the world of HTML, a simple markup language now crushed under the burden of attempting to support the World Wide Web in its incarnation as all things to all people. The difficulty of changing the HTML standard has stifled the development of extensions to support specialized data and vertical applications; yet without the standard, there would be no Web. Palliatives such as plug-ins, or Java or ActiveX applets, are merely end runs around the fundamental limitations of HTML.

But what if the browser interface could mutate to support new data types, new ways of packaging data? What if ISVs and integrators had a standard way of extending what browsers can do without writing applets? And what if all this were possible without sweeping away the vast installed base of useful HTML documents and applications?

That's exactly the promise of the Extended Markup Language (XML) developed under the aegis of the World Wide Web Consortium (W3C). Unlike HTML, which is basically an application of the Standard Generalized Markup Language (SGML) that is hard-wired into browsers, XML is a simplified subset of SGML. It allows content providers, programmers, and integrators to define their own tags and document types--in effect, it's a kind of freely-mutating HTML that can be extended to support virtually any kind of data.

XML already forms the basis of Microsoft's push technology, the Common Data

Format (CDF) and the Open Software Description OSD) specification, a new software delivery format proposed by Marimba and Microsoft. As well, it is serving as inspiration for the Resource Description Format (RDF), a standard for Web meta-information under development by the World Wide Web Consortium. For integrators, XML promises a whole new frontier by enabling the design of Web-enabled systems that specifically support vertical applications and enable the effortless exchange of data across intranets, extranets, and the Internet using simple, browser-based technology.


The Tao of SGML


SGML is not a language but a meta-language: a set of generalized rules used to specify domain-specific languages, a kind of compiler, in fact, much like the programming tool yacc (Yet Another Compiler Compiler.) Its fundamental principle is a simple one: that the design of a language should be determined by the structure of the data it describes, not the output medium it uses.

Most large industries already use SGML to specify a language specific to their technology to promote cooperation. Two good examples of this are the Telecommunications Interchange Markup (TIM) language, and the Pinnacles Groups, a semi-conductor industry effort to develop a markup language to permit sharing semiconductor design information. SGML is also the basis of the markup language used in Microsoft’s Encarta Encyclopedia.

SGML is non-proprietary, system- and platform independent, and promotes the efficient reuse of data. It is almost infinitely flexible, capable of describing the structure of virtually any kind of data or information.

It is also almost infinitely complex, making it difficult to create markup languages, and to write software that can accommodate the demands of SGML. And too many of its options are only rarely used. In addition, SGML instances (documents) are not truly portable. Viewing any document written in an SGML-derived markup language requires a Document Type Description (DTD), a style sheet, and a catalogue file. (The DTD specifies the relationships between the various elements of a document, such as the TOC, chapters, and tables, while the style sheet determines their formatting.) If these three meta-information sources are not available, the document can only be viewed as raw, unformatted SGML, which is much harder to read than raw HTML.


Making HTML Portable

In the case of HTML, these limitations were overcome by hardwiring the HTML DTD (i.e. the HTML standard du jour) and style sheet into the browser. That solved the portability problem by ceding control of the information presented to the client, rather than the server. (HTML style sheets, still not standardized, are an attempt to return some control to the server originating the data.)

The downside was the near-total elimination of extensibility, resulting in technological inertia as the familiar Catch-22 of standards asserted itself: people won't adopt standards without trying them which they can't until vendors offer them, which they won't until people adopt them… Only the largest software vendors can afford to count on the "if you build it they will come" approach, a reality which rendered HTML hostage to the ambitions of Microsoft, Netscape, and others. As a result, the current reality of HTML is a patchwork of solutions attempting to work around the language's fundamental limitation: it was designed for simple hypertext transmission, and cannot adequately represent the many kinds of data that people want to transmit across the Web.

Even more important, HTML conversion inevitably destroys information. For instance, consider an HTML table generated from a database. Without a great deal of hacking, there's no way to import that table back into another database, for the database schema or structure has been lost. This limitation makes the Web largely a one-way street, hampering the exchange of data across corporate intranets and extranets.


XML: Thinning Down the Standard

XML was designed to overcome these limitations by eliminating the infrequently used parts of SGML and rewriting the remainder for better network citizenship. XML is actually more than just a meta language, for it also offers a standardized approach to stylesheets and a far more powerful hyperlinking model than HTML.

XML stylesheets are written using a subset of the Document Style Semantics and Specification Language (DSSSL), itself derived from a dialect of LISP, a powerful language associated with artificial intelligence. They are freely extensible and Turing complete, allowing designers to arbitrarily extend stylesheet capabilities; completely internationalized; and possess a sophisticated rendering model that delivers professional page layout capabilities. XML hyperlinking is a subset of HyTime, an ISO standard for hypertext, hypermedia, and time-based multimedia, and will offer such improvements as bidirectional links, links that can be specified and managed outside the document they belong to, and link attributes.

But the main source of the excitement about XML is its ability to specify network-friendly languages perfectly adapted to the data they describe. Already the health care industry has latched on to XML as the solution to making the complex information found in patient records truly portable. And vendors in the EDI (Electronic Data Interchange) space are also eyeing XML as a way to bring EDI to the masses. In principle, any integrator with the talent on board to use yacc can also exploit the power of XML to write languages for Web-enabled vertical applications. In fact, yacc itself can be used to specify an XML language, and perl can be used to parse XML instances, although specialized tools will eventually eclipse these programmer tools.

XML also makes it possible to deliver information in forms that can be manipulated by the client without further server or network involvement. For instance, instead of downloading a merely textual table of contents for a document, XML could deliver a structured TOC object that could be expanded or contracted by the user. Likewise, spreadsheet or database-type information could be downloaded with its schema intact, allowing the browser to create different views of the data locally; in addition, XML’s preservation of structure would make possible drag and drop transfer of information from a browser window to a database. Multimedia will benefit enormously from XML's descriptive ability--in fact, XML may be the only hope for true integration of the PC with television. How else to describe and rate 500+ channels?

Despite the fact that there is no XML-capable browser yet, this standard is already pervasive. Microsoft's CDF push format is written in XML, as is the software delivery language OSD championed by both Microsoft and Marimba. Netscape touts the Resource Definition Framework (RDF), also based on XML, as a proposed Web standard data model; it forms the basis of the company’s new Aurora technology. Dynamic HTML, as well, draws on XML. DataChannel is developing Xapi-J, which gives Java and Javascript programs a way of extracting data from XML instances. Integrators expecting to capitalize on XML should get started now, for very soon the Web will be flooded with data accessible to XML tools, and customers will be clamoring for software that can help them capture, interpret and use it.


This Issue | Subscribe | Archive | Advertise | Tech Talk
Events | Channel Associations | Storage Inc. Home
SMS Home | WWPI Home | Feedback | CTR Home

wwplogo.jpg (24916 bytes)