KMWorld, May 11th, 1998
Solutions for Web delivery: XML, DHTML, PDF
By Tony McKinley
No one would have predicted that the document would be the last element of Web and intranet solutions to be developed, but that's how it has turned out. The Internet has supplied the global network connections and browsers have provided a standard user interface; lagging are the appropriate document formats to take advantage of the new environment.
HyperText Markup Language (HTML), the lingua franca of the Web, is designed to be simple. Due to its simplicity, it is lacking in two areas: content management and robust presentation. Dynamic HTML (DHTML) has been developed to address the presentation limitations of HTML by giving the author more control of the view the user will see. That set off an "extensions" war between Microsoft (www.microsoft.com) and Netscape (www.netscape.com), which has diminished the universality of DHTML.
Adobe's (www.abode.com) Portable Document Format (PDF) delivers a universal document that looks the same on every platform. Acrobat PDF is the electronic printing press for Web documents, and all applications can simply "print" PDF as easily as printing paper. There is no need to recreate documents in HTML. PDF is designed to solve the presentation problem, and is today and will remain for the foreseeable future the dominant leader in the field of delivering rich content on the Web.
Extensible Markup Language (XML) is a format standard recommended by the World Wide Web Consortium (W3C, www.w3.org), and is heavily supported by Microsoft. Microsoft Internet Explorer 4.0 and the new version of Netscape Communicator both support XML. XML is designed to address the content management aspect of electronic documents. Like its ancestor, the Standard Generalized Markup Language (SGML), the new XML standard promises a consistent description of the content of files to enable utmost flexibility in use and access to specific elements.
XML is where data processing meets text documents. By declaring standard encoding methods within the wide open framework of text, every word, paragraph and item in a document can be assigned the equivalent of a meta tag. In HTML, meta tags are used to declare special fields of information, such as keywords. In XML, text can be recognized as values in fields such as name, address, customer number, quantity, price and so on. That capability, when widely adopted, will provide the basis for electronic data interchange (EDI). For example, if Company A uses its SAP (www.sap.com) accounting system to purchase equipment from Company B, which uses Oracle (www.oracle.com) to manage its inventory and billing, all of the data in the transaction could be shared without any need to recapture or translate information between the SAP and Oracle environments.
There is no battle between HTML and XML and PDF. HTML is the simple basis for Web pages. XML is a means of managing the simple text of the HTML world as organized data, but it does not contain information about how the data should be displayed. PDF is a rich electronic document delivery format, for display and print.
There are shared features among all of those. For example, the extensibility of XML will lend itself to creative presentation techniques in the future. PDF has up to 32,000 index fields in its basic architecture, and the new Javascript features built into PDF Dynamic Forms offer tremendous promise for truly "active" PDF documents. But from the ground up, XML and PDF address completely different requirements, and there is no conflict between them.
XML--Extensible Markup Language
That huge buzz you hear about XML is its promise as the Holy Grail of the information industry: something we can rely on, something that won't change. We live in a constantly evolving environment of upgrading hardware and software, and we seek stability. XML offers an organized language for our structured content, no matter how much our client machines and communications may change.
The W3C XML-recommended standard offers a way out of the morass, by imposing order on file language. It's very easy to get distracted on the philosophical issues. Like all business issues, the best results are usually found by sticking close to basic practices and making small improvements here and there. Every business does not need every scientific classification field for its files; it only needs a small business subset of classifications. But to make those few classifications work around the world, they must conform to a larger, global definition.
A simple view of how XML saves everybody time:
SGML was the right idea;
XML is really feasible
First there was Standard Generalized Markup Language. It was developed to speed the flow of documents among businesses by providing a common identification of elements like price, part number, manufacturer and so on. SGML was the glue that would allow global aerospace companies to cooperate on vast technical projects, like Star Wars, the Reagan version. SGML was designed from the beginning to serve business purposes, decades before electronic commerce became a buzzword.
By agreeing on a standard way to designate things like price, number, amount, description and so on, we could go a long way toward improved communications. Instead of having to worry about hooking up every operating system, hardware upgrade and software version, a universal data format will let us get back to the business of just using information instead of spending all the time managing it.
But not many companies or organizations went along with the idea, and not enough people saw sufficient payback to give SGML popular acceptance. With the W3C, Microsoft and Netscape support behind it now, XML should burst through to acceptance. It's an idea that makes enough sense to be widely accepted.
XML: more data, less presentation
XML is the language that Tim Bray (co-editor of the XML spec and editor of the "Gilbane Report") and his friends have proposed as the Webby solution for data exchange. It's short and sweet, all the useful organizational features of SGML, minus all the complicated fine-tuning that no amateur will ever use. SGML was beyond common reach because it was too difficult and expensive to code documents so extensively. XML brings up all the appealing ideas of tossing aside the file incompatibilities we've all suffered, while still achieving logical file organization. The XML server is the real attraction, because a single unified representation of data resides on the server. The field naming conventions of XML are the reason for all the excitement, because we can publish data once and use it in an infinitely varying series of tasks.
That is why XML is so hot: because we can finally isolate our data in a historically dependable format. Most importantly we will have unshackled every innocent user who never wanted to be a computer person. XML offers the chance to simplify all the nonsense of non-compatible files and needless computer fiddling for everybody.
That acceptance of XML data descriptions is the single factor that merges Web publishing and EDI. For example, in an XML parts catalog on the Web, all of the items would be dynamically updated. Everything from price, availability and delivery would be available 24/7. Automated order entry and customer service transactions would be easily handled by XML servers, with any number of online XML-compatible catalogs.
XML and searching
XML and today's content search techniques will achieve beautiful music together. But like every other electronic document management system, it will take some effort to add XML fields to existing documents.
The metadata that is so attractive in XML also exists in some form in legacy documents. The author, subject, title, keywords and other fields in word processing files can be used to add structure to file collections. The system fields like "date created," "last modified" and so on can be likewise exploited for ex post facto orderliness. There will be a robust market for XML-izing that latent data in current document collections.
Full implementation and enjoyment of XML standards will require widespread if not universal adoption. If just a few vendors don't make the choice, XML could fail. Even if mighty Microsoft declares XML the standard of choice, if IBM (www.ibm.com) and SAP still require some sort of "translation" of XML, it will not work as advertised.
It's happening now
All of the promise of XML functionality has been happening for a long time on the Web. Even without a universal language, the Web has developed into a highly interactive environment, far beyond the one-way idea of the Web as a strictly publishing medium. Some of the coolest early interactive sites were places like Dell Computers (www.dell.com), where you could log on and build yourself a custom PC from a warehouse full of parts. At the end, the program added up your configuration and gave you a price. You could pay for it online with your credit card. Now, that is Web commerce!
The attraction of XML is that not only can the geniuses at Dell achieve such feats of online transactions, but that everybody in business can do the same. Where Dell's revolutionary style of direct sale over the Web works through custom server routines, XML documents themselves will offer similar automated function. In an XML document, the "price" field can be dynamically linked to live corporate pricing to provide instantaneous quotes.
While anything is possible in the dynamic environment of XML, the bottom line is that this is a data delivery system. Data can be documents and presentations, of course. But the idea that XML is somehow competing with PDF is just an illusion.
PDF: multimedia, more data
The roots of PDF are in the far different soil of mainstream commercial life, and compared to the largely theoretical and philosophical debates surrounding how XML might work in the future, conversations about PDF concern implementation issues shared by tens of millions of Acrobat users.
In the world's most demanding digital document management market, in New York advertising and publishing shops, Acrobat PDF is breaking price-performance barriers and providing a competitive edge that no one can ignore. While XML remains in a cranky infancy, Acrobat PDF is a mature, widely adopted presentation format.
Tony McKinley is president of Intelligent Imaging (http://imagebiz.com), 610-647-5570, E-mail tonymck@imagebiz.com. Tony is a charter member of The Camden Group.
|