Eliot Kimber on HTML
Newsgroups: comp.text.sgml
Path: msuinfo!agate!howland.reston.ans.net!pipex!sunic!trane.uninett.no!nac.no!
ifi.uio.no!naggum.no!comp-text-sgml
Approved: erik@naggum.no
Date: 29 Nov 1994 02:33:49 UT
From: Erik Naggum <erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19941129.6427@naggum.no>
Subject: WWW, Mosaic, and Data Standards (Wired October 1994)
Lines: 109
Eliot was so kind as to send me his article to Wired. I thought it would
be good for you all to see on the newsgroup, and he agreed, so here it is:
To: rants@wired.com
From: Eliot Kimber <kimber@passage.com>
Subject: WWW, Mosaic, and Data Standards (Wired October 1994)
In his article on Mosaic, Gary Wolf writes:
"Berners-Lee and his colleagues faced the problem creating a unified
hypertext network ... in a diverse international environment. They came
up with a stunning solution. Rather than attempt to impose standards on
the hardware or software, they defined standards for the data."
While this is indeed a stunning solution, and Tim and his colleagues
deserve a great deal of credit and praise for using it, they certainly
did not invent it. The standard they chose, SGML, or Standard
Generalized Markup Language, has been an ISO standard since 1986 (ISO
8879). Its technical and intellectual precursor, GML, was developed in
the early 1970's and was part of the IBM DCF/Script text formatting
product starting in about 1982. Many large enterprises, not least of
which was IBM, recognized the tremendous benefits of data standards. For
example, IBM, normally touted as the second largest publisher in the
"free world", produces about 85 or 90% of its product documentation using
GML and probably almost as much of its internal documentation. IBM is
also in process of moving from GML to SGML (it has millions of pages of
documentation in GML, which has been in use for over 10 years, so moving
to SGML will take some time). Most fortune 500 companies use GML or SGML
for some part of their operation. For example, GM uses SGML for all of
its maintenance information, allowing them to derive a wide variety of
publications, including both print manuals an online maintenance support
systems, from a single database of information, and have been, in one
form or another, since the late 1970's. Data standards represent
significant cost savings and competitive advantage that industry was
quick to recognize and take advantage of.
One interesting aspect of the Web has been the influence of Mosaic in
particular on the HTML (hypertext markup language) standard that Tim and
his colleagues originally developed. In a true SGML system (which the
Web and Mosaic are not today), the definition of a particular document
type, such as HTML, is formally defined in a way that allows document
processors to validate a document to see if it meets the requirements of
the document type. For example, authors are not free to define new
element types (tag names) in a completely ad-hoc fashion (although
authors can extend a document type themselves). However Mosaic, and
most, if not all, of the other Web browsers to not do this validation.
They take HTML documents as they get them. Elements they recognize they
process and elements they don't they ignore (at least Mosaic does).
However, this also means that browsers can unilaterally provide support
for elements that may not be in the official document type. This undoes
the very benefits of data standardization by allowing tool creators,
rather than data owners, to control the data standard. The lack of
validation has lead to a degree of chaos in the Web because people assume
that if a document works under Mosaic that it must be a well-formed
document, when in fact it may not be. The freeware HTML editor HoTMetaL,
from SoftQuad, Inc., made this clear when people tried to use it to edit
their documents. HoTMetaL is a true SGML application and therefore does
the proper validation, using the DTD defined by the Web team. People are
starting to understand the purpose of validation is to ensure that
documents are well formed, which is necessary to ensure interchange and
interoperability.
One reason Mosaic is so popular is that it defined support for new
elements or new combinations of elements that were not in the original
HTML definition, giving it a significant competitive advantage over other
browsers. This meant that if you created a document that worked with
Mosaic and used its unique features, it might not work or work as
expected with other browsers. This of course erodes the value of data
standards in the first place. Rather than having the standard extended
by controllers of the standard, it was extended unilaterally by
Andreessen, without any sort of check or control. One of the benefits of
standards is that changes to them are usually made according to some
protocol that ensures at least little bit of review and consensus by the
parties interested in the standard.
So while Mosaic is very cool and offers some very nifty functions, it has
at the same time served to undo to some degree the very intent of the
original Web designers by effectively wresting control of the data
standard from the owners of the standard (and the data that conforms to
it) and placing it the hands of the developers of a specific product.
The chief purpose of SGML is to prevent exactly this sort of extortion on
the part of product vendors by keeping the control of the data format in
the hands of data owners.
This said, it should be stressed that the World Wide Web is very
important precisely because it does prove the basic premise of data
standards in general and SGML in particular, which is that by focusing on
data standards, you can enable exactly the sort of distributed access to
information that the Web provides. While there are certainly flaws in
HTML, the URL mechanism, and the various Web browsers, to a large degree
they don't matter (no matter how much SGML pedants like myself may carp
about them), because the point is proven. Tim and his team were right
and have built something wonderful. The job now is to refine the system,
which many people are involved in doing. They've built the model T,
proved the point, and gotten a lot of people very excited, and for that
they've earned a place in history.
Cheers,
Eliot Kimber
#<Erik>
--
Oslo, Norway (1994-11-29) -- A thousand-year focus on Europe was broken
yesterday as Norway rejected the pending membership application to the
European Union. In a referendum the gatherers (women, children, fishermen
and farmers) feared they would fail to entice the hunters to remain
grounded, so instead voted to erect a strong fence around the country.
Support the Norwegian Hunter Liberation Front! This may be the last
uncensored message from Norway. Do not believe the rosy propaganda!
.