Eliot Kimber on HTML

Newsgroups: comp.text.sgml
Path: msuinfo!agate!howland.reston.ans.net!pipex!sunic!trane.uninett.no!nac.no!
Approved: erik@naggum.no
Date: 29 Nov 1994 02:33:49 UT
From: Erik Naggum <erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19941129.6427@naggum.no>
Subject: WWW, Mosaic, and Data Standards (Wired October 1994)
Lines: 109

Eliot was so kind as to send me his article to Wired.  I thought it would
be good for you all to see on the newsgroup, and he agreed, so here it is:

  To: rants@wired.com
  From: Eliot Kimber <kimber@passage.com>
  Subject: WWW, Mosaic, and Data Standards (Wired October 1994)

  In his article on Mosaic, Gary Wolf writes:

  "Berners-Lee and his colleagues faced the problem creating a unified
  hypertext network ... in a diverse international environment.  They came
  up with a stunning solution.  Rather than attempt to impose standards on
  the hardware or software, they defined standards for the data."

  While this is indeed a stunning solution, and Tim and his colleagues
  deserve a great deal of credit and praise for using it, they certainly
  did not invent it.  The standard they chose, SGML, or Standard
  Generalized Markup Language, has been an ISO standard since 1986 (ISO
  8879).  Its technical and intellectual precursor, GML, was developed in
  the early 1970's and was part of the IBM DCF/Script text formatting
  product starting in about 1982.  Many large enterprises, not least of
  which was IBM, recognized the tremendous benefits of data standards.  For
  example, IBM, normally touted as the second largest publisher in the
  "free world", produces about 85 or 90% of its product documentation using
  GML and probably almost as much of its internal documentation.  IBM is
  also in process of moving from GML to SGML (it has millions of pages of
  documentation in GML, which has been in use for over 10 years, so moving
  to SGML will take some time).  Most fortune 500 companies use GML or SGML
  for some part of their operation.  For example, GM uses SGML for all of
  its maintenance information, allowing them to derive a wide variety of
  publications, including both print manuals an online maintenance support
  systems, from a single database of information, and have been, in one
  form or another, since the late 1970's.  Data standards represent
  significant cost savings and competitive advantage that industry was
  quick to recognize and take advantage of.

  One interesting aspect of the Web has been the influence of Mosaic in
  particular on the HTML (hypertext markup language) standard that Tim and
  his colleagues originally developed.  In a true SGML system (which the
  Web and Mosaic are not today), the definition of a particular document
  type, such as HTML, is formally defined in a way that allows document
  processors to validate a document to see if it meets the requirements of
  the document type.  For example, authors are not free to define new
  element types (tag names) in a completely ad-hoc fashion (although
  authors can extend a document type themselves).  However Mosaic, and
  most, if not all, of the other Web browsers to not do this validation.
  They take HTML documents as they get them.  Elements they recognize they
  process and elements they don't they ignore (at least Mosaic does).
  However, this also means that browsers can unilaterally provide support
  for elements that may not be in the official document type.  This undoes
  the very benefits of data standardization by allowing tool creators,
  rather than data owners, to control the data standard.  The lack of
  validation has lead to a degree of chaos in the Web because people assume
  that if a document works under Mosaic that it must be a well-formed
  document, when in fact it may not be.  The freeware HTML editor HoTMetaL,
  from SoftQuad, Inc., made this clear when people tried to use it to edit
  their documents.  HoTMetaL is a true SGML application and therefore does
  the proper validation, using the DTD defined by the Web team.  People are
  starting to understand the purpose of validation is to ensure that
  documents are well formed, which is necessary to ensure interchange and

  One reason Mosaic is so popular is that it defined support for new
  elements or new combinations of elements that were not in the original
  HTML definition, giving it a significant competitive advantage over other
  browsers.  This meant that if you created a document that worked with
  Mosaic and used its unique features, it might not work or work as
  expected with other browsers.  This of course erodes the value of data
  standards in the first place.  Rather than having the standard extended
  by controllers of the standard, it was extended unilaterally by
  Andreessen, without any sort of check or control.  One of the benefits of
  standards is that changes to them are usually made according to some
  protocol that ensures at least little bit of review and consensus by the
  parties interested in the standard.

  So while Mosaic is very cool and offers some very nifty functions, it has
  at the same time served to undo to some degree the very intent of the
  original Web designers by effectively wresting control of the data
  standard from the owners of the standard (and the data that conforms to
  it) and placing it the hands of the developers of a specific product.
  The chief purpose of SGML is to prevent exactly this sort of extortion on
  the part of product vendors by keeping the control of the data format in
  the hands of data owners.

  This said, it should be stressed that the World Wide Web is very
  important precisely because it does prove the basic premise of data
  standards in general and SGML in particular, which is that by focusing on
  data standards, you can enable exactly the sort of distributed access to
  information that the Web provides.  While there are certainly flaws in
  HTML, the URL mechanism, and the various Web browsers, to a large degree
  they don't matter (no matter how much SGML pedants like myself may carp
  about them), because the point is proven.  Tim and his team were right
  and have built something wonderful.  The job now is to refine the system,
  which many people are involved in doing.  They've built the model T,
  proved the point, and gotten a lot of people very excited, and for that
  they've earned a place in history.

  Eliot Kimber

