CP RSS Channel
About Our Sponsors
Articles & Papers
Technology and Society
|SGML on the Web: Small Steps Beyond HTML, by Rubinsky and Maloney. Volume Preface.|
SGML on the Web: Small Steps Beyond HTML.
By Rubinsky and Maloney
Rubinsky, Yuri; Maloney, Murray. SGML on the Web: Small Steps Beyond HTML. Charles F. Goldfarb Series On Open Information Management. Upper Saddle River, NJ: Prentice Hall PTR [Professional Technical Reference], 1997. Extent: 528 pages, CDROM with Panorama Pro 2.0 and other software. ISBN: Paper (0-13-519984-0). Authors' affiliation: SoftQuad, Inc. See the bibliographic entry for other descriptions.
Firmness, Commodity and Delight
HyperText Markup Language (HTML) is the world's best
known application of Standard Generalized Markup Language (SGML),
the International Standard language for open information management.
But HTML taps only a small fraction of SGML's potential. This
book will show you how to go well beyond what you could do with
HTML. Naturally, a book whose subtitle is "Small Steps Beyond
HTML" has to begin by asking: "Do we need more than
Let's think first about architecture. The Roman architect
Vitruvius, who lived at the time of Caesar Augustus, wrote a great
and influential work called the Ten Books on Architecture.
In those books, he expressed the opinion that architecture must
offer "Firmness, Commodity and Delight". Two thousand
years later, his advice still holds true for any form of architecture:
Whether you are designing buildings or creating electronic structures
for your information, you must take into account these characteristics
of good architecture.
In a speech given at the North American SGML conference
in 1989, architect Douglas MacLeod spoke about Vitruvius, and
he explained the three characteristics in this way: "It is
perhaps easier to understand these ideas in the context of buildings.
Firmness is a good structure that holds up a building under all
manner of conditions-during high winds, earthquakes, fires and
snowstorms. Commodity is what makes a building comfortable-things
are the right size, the heating and cooling systems work and you
don't have to climb too many stairs in the course of the day.
But Delight is what makes the building worth being in-Delight
is what makes the building more than just a shelter. It may be
an intellectual delight, a visual delight or even a delight to
be in to listen to music, but it brings something more to the
building than just functionality."
The World Wide Web is a form of architecture, an
architecture of information, and accordingly, it must meet the
criteria established by Vitruvius for good architecture. The World
Wide Web expresses its architecture through the structures it
gives us for expressing information, that is, through the HyperText
At a high level, an architecture of information is
the infrastructure we build to coordinate, take advantage of and
make a coherent system out of our new ideas, tools, networks and
capabilities. At a lower level, an architecture of information
is the set of electronic structures we develop to create, edit,
store, retrieve, manipulate, manage and make public our information
content. In the case of the world Wide Web, we have been given
a set of structures that includes paragraphs, titles, headings
of various sorts, lists, images, links from one document to another,
and so forth. But is this enough?
That question can be answered only by posing another:
Enough for what? It is clear that HTML has provided enough
capability for people to create tens of millions of documents.
It has provided a common language of markup that lets those documents:
- be read by any kind of computer
- be read by various pieces of software
- be displayed in a meaningful screen presentation
on all those computers
- be linked to one another
- be able to incorporate images, audio, video and
- be stored in databases and searched in rudimentary
This is a great deal of significant functionality
given that the only requirement is that the documents include
a little bit of markup-a few extra words with angle brackets around
them-that indicates paragraphs, titmes, headings, and so forth.
So to get back to my second question: Is this enough? Does this
degree of functionality balance the overhead of using the markup?
Is this a cost-effective way of making information available?
As with any other business decision, the question
of whether HTML works for you comes down to the business case.
Another, separate decision needs to be made first:
Does the Web itself offer you something you cannot already get
for less cost?
- Markets: Does the Web reach your readers and
customers? Will it reach new ones?
- Speed: Does the Web reach them faster than other
means? Does speed of delivery matter to you at all?
- Depth: Does the web reach enough of them to be
a worthwhile investment?
- Security and Payment Considerations: Can you
do business on the web with confidence? Alternatively, can you
do business successfully on the Web without needing to collect
money on the web?
- Pro-Activeness: Is there value in giving your
readers or customers a more active role in the acquisition of
your contents than they might otherwise get? (That is, does the
Web's interactiveness add value to your message? Does your data
lend itself to hypertext path-making?)
If you are satisfied with enough of the answers to
these questions, and you have decided that the Web is part of
your future, then you have an opportunity to move to the second
decision: Is HTML enough? Now you get to ask questions about capability:
- Display Capability: Is the appearance of a web
page good enough? Does it suit both your readers and your contents?
Are there display distinctions you would like to make that HTML
doesn't give you today?
- Functional Capability: Do the modes and means
of interactivity work for your content and your readers? Could
your data benefit from additional types of hypertext linking-(two-way
links, for example, or links from one part of a graphic to part
Extending Capabilities in a Stable Framework
Part of HTML's success comes from the simplicity
of Web. Click:! And a file appears in your browser or your
There's a reason for this immediacy-the software
knows what to expect of the file you're sending it. It expects:
- plain text characters with no control characters
or other unusual characters
- a specific set of markup, the stuff in angle
brackets, that tells the software that<TITIE>This is a title.</TITLE> and that <P>This is a paragraph.</P>
- a small, finite set of known ways to interact
with the markup and perform actions: traverse links, make headings
bold, and so forth.
The secret is No surprises. The software knows
the file will be HTML (the .htm or .html extension
helps), or, to be more precise, assumes the tile will be in HTML
and acts accordingly.
A growing number of web software-makers are inventing
their own additional markup. Netscape Navigator and Microsoft's
Internet Explorer are competing for market share with their proprietary
extensions to HTML. This is fine in their own software, where
they can pretend that their new markup is not a surprise. It does
suggest that they want everyone who might read your documents
to have their software.
This book suggests a slightly different approach.
In a nutshell, I take the position that inventing new markup makes
sense, indeed offers valuable new capabilities to both creators
and consumers of information. But to be commercially viable in
your business, markup must maintain the "No surprises"
As luck would have it, there is a standard way of
saying to software "This is the markup I use." To accomplish
this, we let a document contain a "preamble" or "prolog",
much like a book that defines the terms that it is going to use,
or a computer program that defines the functions it then uses
The document Prolog can be in one of two forms. It
- definitions of new markup, right there in the
- a pointer to a separate file (or even a set of
files, if you want to be fancy) that contains all the markup definitions.
This is, in fact, exactly what HTML does. It uses
an internationally standardized mechanism-SGML-to create definitions.
The Web browsers that you use, know about the definitions-more
or less. HTML counts on the fact that browsers know the definitions.
Accordingly, they don't use a Prolog - they assume the definitions
haven't changed-the standard one is "implied".
But there is a different class of software-software
built to read and understand any Prolog before it interprets the
document itself-the Prolog and the definitions it either contains
or points to. Those definitions are written in SGML-the Standard
Generalized Markup Language-and software that understands this
mechanism for creating your own or extending existing markup is
Using SGML as a markup declaration language
allows you to create any markup you want, and have it clearly
understood by any other SGML software that encounters it - now
or in the future. SGML software will then let you assign display
or print characteristics to that markup through style sheets.
Do you want your headings bold and centered? Which markup do you
want to use to indicate hypertext links?
That's what this book is about. What are the simplest
steps you can take to invent the markup that makes sense for your
information? What are the tricks that others have used to achieve
certain capabilities? What are the basic principles that will
let you assign both markup and characteristics?
Stability, Usefulness and Extensibility
I began this essay by talking about two-thousand-year-old precepts
of good architecture. I'd like to end by updating those in the
context of the World Wide Web.
If Vitruvius had designed the Web and the language
of its information structures, he would have felt no differently
than Tim Berners-Lee, its actual inventor, on the subject: To
provide a real service to online readers, it must be robust. To
be treated as a viable commercial medium, it must offer stability.
In these early, experimental days, we're all willing
to put up with difficulties, downtime and a confusing competition
for capability, but before long everyone who wants to conduct
business of the Web will insist on firmness.
Usefulness. The Web must do what you need it to
do, within the normal bounds of reason. (You can't expect it to
replace all other forms of communications, for example, but you
can insist that it do a good job at being a somewhat friendly
means to publish and consume linked, global, digital data in a
variety of media.) It must accommodate your reasonable requirements.
I don't quite do justice to Vitruvius' request for
delight when I suggest that the Web and its information structures
must be extensible, but it is the case that without a built-in
framework for extensibility (and, as luck would have it, a social
environment that thrives on experimentation), we have no possibility
of discovering pockets of the Web that do surprise and delight
The reason I insist on the full value of SGML on
the Web is that it meets Vitruvius' requirements: SGML, an ISO
standard with a ten-year history of successful implementation,
certainly offers firmness and stability. As a language that enables
you to define the new information structures you need, it is accommodating,
meeting the commodity/useful criterion. And because it was designed
as a tool for extensibility, its support for invention and delight
is limited only by your imagination.
|Receive daily news updates from Managing Editor, Robin Cover.|