Last modified: July 15, 1998

Technology Reports

SGML on the Web: Small Steps Beyond HTML, by Rubinsky and Maloney. Volume Preface.

SGML on the Web: Small Steps Beyond HTML.
By Rubinsky and Maloney

Volume Preface

Rubinsky, Yuri; Maloney, Murray. SGML on the Web: Small Steps Beyond HTML. Charles F. Goldfarb Series On Open Information Management. Upper Saddle River, NJ: Prentice Hall PTR [Professional Technical Reference], 1997. Extent: 528 pages, CDROM with Panorama Pro 2.0 and other software. ISBN: Paper (0-13-519984-0). Authors' affiliation: SoftQuad, Inc. See the bibliographic entry for other descriptions.

Volume Preface

Firmness, Commodity and Delight

HyperText Markup Language (HTML) is the world's best known application of Standard Generalized Markup Language (SGML), the International Standard language for open information management. But HTML taps only a small fraction of SGML's potential. This book will show you how to go well beyond what you could do with HTML. Naturally, a book whose subtitle is "Small Steps Beyond HTML" has to begin by asking: "Do we need more than HTML?''

Let's think first about architecture. The Roman architect Vitruvius, who lived at the time of Caesar Augustus, wrote a great and influential work called the Ten Books on Architecture. In those books, he expressed the opinion that architecture must offer "Firmness, Commodity and Delight". Two thousand years later, his advice still holds true for any form of architecture: Whether you are designing buildings or creating electronic structures for your information, you must take into account these characteristics of good architecture.

In a speech given at the North American SGML conference in 1989, architect Douglas MacLeod spoke about Vitruvius, and he explained the three characteristics in this way: "It is perhaps easier to understand these ideas in the context of buildings. Firmness is a good structure that holds up a building under all manner of conditions-during high winds, earthquakes, fires and snowstorms. Commodity is what makes a building comfortable-things are the right size, the heating and cooling systems work and you don't have to climb too many stairs in the course of the day. But Delight is what makes the building worth being in-Delight is what makes the building more than just a shelter. It may be an intellectual delight, a visual delight or even a delight to be in to listen to music, but it brings something more to the building than just functionality."

The World Wide Web is a form of architecture, an architecture of information, and accordingly, it must meet the criteria established by Vitruvius for good architecture. The World Wide Web expresses its architecture through the structures it gives us for expressing information, that is, through the HyperText Markup Language.

At a high level, an architecture of information is the infrastructure we build to coordinate, take advantage of and make a coherent system out of our new ideas, tools, networks and capabilities. At a lower level, an architecture of information is the set of electronic structures we develop to create, edit, store, retrieve, manipulate, manage and make public our information content. In the case of the world Wide Web, we have been given a set of structures that includes paragraphs, titles, headings of various sorts, lists, images, links from one document to another, and so forth. But is this enough?

That question can be answered only by posing another: Enough for what? It is clear that HTML has provided enough capability for people to create tens of millions of documents. It has provided a common language of markup that lets those documents:

be read by any kind of computer
be read by various pieces of software
be displayed in a meaningful screen presentation on all those computers
be linked to one another
be able to incorporate images, audio, video and animation
be stored in databases and searched in rudimentary ways.

This is a great deal of significant functionality given that the only requirement is that the documents include a little bit of markup-a few extra words with angle brackets around them-that indicates paragraphs, titmes, headings, and so forth. So to get back to my second question: Is this enough? Does this degree of functionality balance the overhead of using the markup? Is this a cost-effective way of making information available?

As with any other business decision, the question of whether HTML works for you comes down to the business case.

Another, separate decision needs to be made first: Does the Web itself offer you something you cannot already get for less cost?

Markets: Does the Web reach your readers and customers? Will it reach new ones?
Speed: Does the Web reach them faster than other means? Does speed of delivery matter to you at all?
Depth: Does the web reach enough of them to be a worthwhile investment?
Security and Payment Considerations: Can you do business on the web with confidence? Alternatively, can you do business successfully on the Web without needing to collect money on the web?
Pro-Activeness: Is there value in giving your readers or customers a more active role in the acquisition of your contents than they might otherwise get? (That is, does the Web's interactiveness add value to your message? Does your data lend itself to hypertext path-making?)

If you are satisfied with enough of the answers to these questions, and you have decided that the Web is part of your future, then you have an opportunity to move to the second decision: Is HTML enough? Now you get to ask questions about capability:

Display Capability: Is the appearance of a web page good enough? Does it suit both your readers and your contents? Are there display distinctions you would like to make that HTML doesn't give you today?
Functional Capability: Do the modes and means of interactivity work for your content and your readers? Could your data benefit from additional types of hypertext linking-(two-way links, for example, or links from one part of a graphic to part of another)?

Extending Capabilities in a Stable Framework

Part of HTML's success comes from the simplicity of Web. Click:! And a file appears in your browser or your editor.

There's a reason for this immediacy-the software knows what to expect of the file you're sending it. It expects:

plain text characters with no control characters or other unusual characters
a specific set of markup, the stuff in angle brackets, that tells the software that<TITIE>This is a title.</TITLE> and that <P>This is a paragraph.</P>
a small, finite set of known ways to interact with the markup and perform actions: traverse links, make headings bold, and so forth.

The secret is No surprises. The software knows the file will be HTML (the .htm or .html extension helps), or, to be more precise, assumes the tile will be in HTML and acts accordingly.

A growing number of web software-makers are inventing their own additional markup. Netscape Navigator and Microsoft's Internet Explorer are competing for market share with their proprietary extensions to HTML. This is fine in their own software, where they can pretend that their new markup is not a surprise. It does suggest that they want everyone who might read your documents to have their software.

This book suggests a slightly different approach. In a nutshell, I take the position that inventing new markup makes sense, indeed offers valuable new capabilities to both creators and consumers of information. But to be commercially viable in your business, markup must maintain the "No surprises" principle.

As luck would have it, there is a standard way of saying to software "This is the markup I use." To accomplish this, we let a document contain a "preamble" or "prolog", much like a book that defines the terms that it is going to use, or a computer program that defines the functions it then uses to run.

The document Prolog can be in one of two forms. It could be:

definitions of new markup, right there in the document, or
a pointer to a separate file (or even a set of files, if you want to be fancy) that contains all the markup definitions.

This is, in fact, exactly what HTML does. It uses an internationally standardized mechanism-SGML-to create definitions. The Web browsers that you use, know about the definitions-more or less. HTML counts on the fact that browsers know the definitions. Accordingly, they don't use a Prolog - they assume the definitions haven't changed-the standard one is "implied".

But there is a different class of software-software built to read and understand any Prolog before it interprets the document itself-the Prolog and the definitions it either contains or points to. Those definitions are written in SGML-the Standard Generalized Markup Language-and software that understands this mechanism for creating your own or extending existing markup is SGML software.

Using SGML as a markup declaration language allows you to create any markup you want, and have it clearly understood by any other SGML software that encounters it - now or in the future. SGML software will then let you assign display or print characteristics to that markup through style sheets. Do you want your headings bold and centered? Which markup do you want to use to indicate hypertext links?

That's what this book is about. What are the simplest steps you can take to invent the markup that makes sense for your information? What are the tricks that others have used to achieve certain capabilities? What are the basic principles that will let you assign both markup and characteristics?

Stability, Usefulness and Extensibility

I began this essay by talking about two-thousand-year-old precepts of good architecture. I'd like to end by updating those in the context of the World Wide Web.

Firmness:

If Vitruvius had designed the Web and the language of its information structures, he would have felt no differently than Tim Berners-Lee, its actual inventor, on the subject: To provide a real service to online readers, it must be robust. To be treated as a viable commercial medium, it must offer stability.

In these early, experimental days, we're all willing to put up with difficulties, downtime and a confusing competition for capability, but before long everyone who wants to conduct business of the Web will insist on firmness.

Commodity:

Usefulness. The Web must do what you need it to do, within the normal bounds of reason. (You can't expect it to replace all other forms of communications, for example, but you can insist that it do a good job at being a somewhat friendly means to publish and consume linked, global, digital data in a variety of media.) It must accommodate your reasonable requirements.

Delight:

I don't quite do justice to Vitruvius' request for delight when I suggest that the Web and its information structures must be extensible, but it is the case that without a built-in framework for extensibility (and, as luck would have it, a social environment that thrives on experimentation), we have no possibility of discovering pockets of the Web that do surprise and delight us.

The reason I insist on the full value of SGML on the Web is that it meets Vitruvius' requirements: SGML, an ISO standard with a ten-year history of successful implementation, certainly offers firmness and stability. As a language that enables you to define the new information structures you need, it is accommodating, meeting the commodity/useful criterion. And because it was designed as a tool for extensibility, its support for invention and delight is limited only by your imagination.

Yuri Rubinsky

Toronto, Canada

November 1995


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

SGML on the Web: Small Steps Beyond HTML. By Rubinsky and Maloney Volume Preface