Header

From @UTARLVM1.UTA.EDU:OWNER-ETEXTCTR@RUTVM1.RUTGERS.EDU Thu Jan 13 16:48:44 1994
Return-Path: <@UTARLVM1.UTA.EDU:OWNER-ETEXTCTR@RUTVM1.RUTGERS.EDU>
Received: from UTARLVM1.UTA.EDU by utafll.uta.edu (4.1/25-eef)
	id AA22750; Thu, 13 Jan 94 16:48:40 CST
Message-Id: <9401132248.AA22750@utafll.uta.edu>
Received: from UTARLVM1.UTA.EDU by UTARLVM1.UTA.EDU (IBM VM SMTP V2R2)
   with BSMTP id 6927; Thu, 13 Jan 94 14:47:33 CST
Received: from UTARLVM1.UTA.EDU (NJE origin LISTSERV@UTARLVM1) by
 UTARLVM1.UTA.EDU (LMail V1.1d/1.7f) with BSMTP id 9461; Thu,
 13 Jan 1994 14:42:37 -0600
Date:         Thu, 13 Jan 1994 15:27:51 -0500
Reply-To: Discussion Group on Electronic Text Centers
              <ETEXTCTR%RUTVM1.bitnet@UTARLVM1.UTA.EDU>
Sender: Discussion Group on Electronic Text Centers
              <ETEXTCTR%RUTVM1.bitnet@UTARLVM1.UTA.EDU>
From: Annelies Hoogcarspel - CETH <HOOGCARSPEL%ZODIAC.bitnet@UTARLVM1.UTA.EDU>
Subject:      Re: SGML and COCOA
To: Multiple recipients of list ETEXTCTR <ETEXTCTR%RUTVM1.bitnet@UTARLVM1.UTA.EDU>
Status: R

Date: Thu, 13 Jan 1994 14:26:48 -0500
From: sjd@ebt.com
Subject: RE: SGML and COCOA

Text

Mark Leggott writes:

>Assuming it is true SGML with a published DTD, you could probably use it as is
>(or with minor modifications) with WorldWideWeb software such as Mosaic.
>WorldWideWeb software is accessable from just about any computer platform you
>can think of, and has excellent functionality.
>
>A simple filter could presumably be used to modify all texts in the
>collection.

This would be a highly desirable state of affairs, but actually Mosaic and
other World Wide Web viewers cannot read SGML except for one special case,
namely the HTML DTD.  And to make matters worse, the HTML DTD that has
been available online for months is invalid (this is another way you can tell
that the viewers do not contain SGML parsers).

Most HTML documents are also not valid SGML.  A subtle but deadly example
is that they seldom quote URL attributes:
   <a hrev=cs.zz.edu/u/zzz/myfile.html>

This may seem harmless enough, but it is a pernicious error in SGML.
Pernicious because it does *not* produce an invalid SGML file, rather
one whose structure is not what the user intended.  There is an
SGML construct known as a "NET-enabling start tag", by which the first
slash character in a start-tag is the end of the tag.  Thus the example
above, in SGML, has a short URL and some weird-looking
extra content, as if it had been:
     <a hrev="cs.zz.edu">u/zzz/myfile.html>

Mosaic will conveniently parse it as if it were quoted, but as soon as
you move to a program using an actual SGML parser, the link will break and
the content will suddenly look mysterious. Other examples are available.

In addition to using only one DTD, Mosaic and its kin do not deal with
marked sections, inclusion and exclusion exceptions, OMITTAG minimization,
or even system entities, so all these must be eliminated from your data
before you can use those tools without data loss.

Of course you could convert the data to HTML, but that would be a sad fate,
for HTML has precious few tags and cannot retain most useful distinctions;
it also means you have to create, synchronize, and maintain 2 versions of
your data, probably an even sadder fate, not to mention writing convertors.

The Web has many nice features, but its SGML support is not strong. It is to
be hoped that these problems will be addressed, but only time will tell.

Steve DeRose
EBT, 1 Richmond Square
Providence RI 02906 USA
(401) 421-9550, fax -9551
sjd@ebt.com