SGML: CTS Digest Sample

SGML: comp.text.sgml (CTS) Digest, Sample Issue

From Sat Mar 18 11:09:24 1995
Return-Path: <>
Received: from by (4.1/25-eef)
	id AA01034; Sat, 18 Mar 95 11:09:07 CST
Received: from by with SMTP
          id <AA17602> for <>; Sat, 18 Mar 1995 15:05:29 UT
Date: 18 Mar 1995 15:05:26 UT
From: "comp.text.sgml digest" <>
Message-Id: <>
Precedence: list
Subject: comp.text.sgml digest Vol 8 #26
Status: R

comp.text.sgml digest           1995-03-18              Volume 8 : Issue 26

The comp.text.sgml digest is volumed monthly with 10 articles per issue.
Articles from the newsgroup are collected from the University of Oslo,
Norway and NETCOM, San Jose, California.

Continued maintenance of the comp.text.sgml archive is made possible in
part by grants from: The SGML Users' Group, the SGML Users' Group Special
Interest Group for Hypertext and Multimedia (SIGhyper), SGML Open; and in
part by subscription fees from the mailing list.  For subscription info,
see the end of the digest.

(Ref) Art #             Author  Subject
----- -----  -----------------  -------------------------------------------
       8210  Andreas Björklind  SGML Contacts in Vietnam
       8211    Andre J. Emmell  Equations in SGML
 8209  8212         Chuck Till  Re: EBT DynaBase: are you a customer?
 8211  8213    Robert Lockwood  Re: Equations in SGML
 8195  8214        Erik Naggum  Re: PCDATA vs RCDATA
 8109  8215        Erik Naggum  Re: Proposed PDF FIPS report
 8191  8216        Joe English  Re: Help for HyTime
 8187  8217      Howard Kaikow  Re: Editing Massive Files
 8143  8218    Jacques Deseyne  Re: Intellitag
 8190  8219        Paul Grosso  Re: PCDATA vs RCDATA
----- -----  -----------------  -------------------------------------------


    Send subscription requests and delivery problem reports to
    <>.  Send articles (only) to
    <>.  The posting service is free.


Article: 8210 of comp.text.sgml
Newsgroups: comp.text.sgml
From: (Andreas Björklind)
Subject: SGML Contacts in Vietnam
Message-ID: <>
Organization: LIBLAB, IDA, Linköping University, Sweden
Date: Fri, 17 Mar 1995 12:24:56 GMT

I'm looking for SGML contacts in Vietnam with at least basic
SGML knowledge.

The reason? I may be going there (Hanoi) to teach basic SGML and I
would like to know who may be able to continue the long term SGML
teaching and consulting effort. "Legal documents" is the operative
term in the project, by the way.

Yours sincerely,
  Andreas Bj&ouml;rklind

    Mr. Andreas Bjorklind, Laboratory for Library and Information
  Science (LIBLAB), Dept. of Comp. & Info. Sci., Linkoping University,
  S-581 83 Linkoping, Sweden. Tel. +46 13 28 19 69, Fax +46 13 14 22 31
             Internet:, AOL: Bjorklind


Article: 8211 of comp.text.sgml
From: "Andre J. Emmell" <>
Newsgroups: comp.text.sgml
Subject: Equations in SGML
Date: 16 Mar 1995 20:39:45 GMT
Organization: Global-X-Change
Message-ID: <3ka7mh$>

Does anyone know an elegant solution on handling of equations 
in SGML.  I am addressing a serious scientific organization
who wants to convert to SGML and to streamline their documentation
which contains a lot of equation.

Any good editors you have found? You may contact me directly if
you would like.

Thanks in advance

Andre J.Emmell


Article: 8212 of comp.text.sgml
From: (Chuck Till)
Newsgroups: comp.text.sgml
Subject: Re: EBT DynaBase: are you a customer?
Date: Fri, 17 Mar 1995 08:46:11 -0500
Organization: Northern Telecom
Message-ID: <charles.till-1703950846110001@>
References: (8209) <3ka2vu$ftn$>

In article <3ka2vu$ftn$>, Alan Burns
<74364.2774@CompuServe.COM> wrote:

> Is anyone else out there working with EBT, or simply awaiting the 
> release of DynaBase? Their release schedule has changed, and I would
> be interested in hearing from other organizations which are hoping 
> to use this product upon its release.
> We hope to use DynaBase as part of a fully SGML compliant repository
> for text and other media objects, for republishing purposes. All of 
> our objects are meant to have equal status, such that many new 
> publications will be created with combinations of various objects. 
> We do not want to pre-judge what those combinations may be.
> DynaBase looks promising, but we wonder about other solutions.

We are just beginning to evaluate DynaBase.

Chuck Till, Sr Mgr - Cust Info Systems Integration, Northern Telecom
RTP NC USA   +1 919 992 3225


Article: 8213 of comp.text.sgml
From: (Robert Lockwood)
Newsgroups: comp.text.sgml
Subject: Re: Equations in SGML
Date: 17 Mar 1995 15:23:13 GMT
Organization: Novell
Message-ID: <3kc9h1$>
References: (8211) <3ka7mh$>
X-Newsreader: WinVN 0.92.6+

In article <3ka7mh$>, "Andre J. Emmell" <> says:
>Does anyone know an elegant solution on handling of equations 
>in SGML.

ADEPT*Editor by ArborText ( supports equations
in SGML based on the AAP standard DTD. They have a nice equation
editor within their product that facilitates creation and
maintenance, and their publishing engine supports postscript output.

I do not know whether other products would support this markup 
directly on the publishing end, however.

Robert P. Lockwood


Article: 8214 of comp.text.sgml
From: Erik Naggum <>
Newsgroups: comp.text.sgml
Subject: Re: PCDATA vs RCDATA
Date: 18 Mar 1995 03:51:08 UT
Organization: Naggum Software; +47 2295 0313
Message-ID: <>
References: (8195) <3k8jnm$>

[kendall thomason shaw]

|   Can someone help me understand the difference between these two statements:
|   <!ELEMENT lmnop - - (#PCDATA)>
|   <!ELEMENT lmnop - - RCDATA>

Joe English has answered well, but a few minor points still need emphasis.

seen from a validation viewpoint, PCDATA (Parsed Character DATA) means that
non-markup characters (a.k.a. data characters) are valid in the content,
i.e., this is where SGML ceases to be concerned with the validity of the
structure of the document.

seen from an application viewpoint, PCDATA is the meat on the markup bones.

if SGML is getting a space shuttle safely into orbit and back, PCDATA is
the payload, the ultimate reason we're doing the exercise, but which is
nothing without the space shuttle (structure) to support it.

now, there are still a few hairy points to consider.  PCDATA is the kind of
payload where you are allowed to hook straps around it and weld it to the
shuttle and whatnot to keep it there -- i.e., markup within it will be
recognized by the SGML parser as "its business".  this applies to end-tags,
entity references, processing instructions, comments, the works.  RCDATA
(and CDATA) is the kind of payload that you aren't allowed to touch, and
you store it in a special container that protects it from shocks and such.
only the RCDATA (and CDATA) containers are broken.  Joe shows the proper
way to package frail goods in space: marked sections.  you never take those
RCDATA/CDATA containers with you on real flights.  they are there because
sometimes you have to do dangerous things, such as releasing that cord that
is, technically speaking, keeping you from discovering DS9 on your own.
even if RCDATA and CDATA were fixed, you would not want to use them,
because they look just like other boxes, and if they really are special,
you should always using special marking tape.  it's not sufficient to trust
the guys who stuff things into the payload bay to know that containers from
Frobozz, Inc, are fragile.  you mark it with "FRAGILE" all over the place.
SGML users and parsers can use the same redundancy to keep from clobbering
important data.

also inclusions, which is like packing a sledgehammer with your test tubes,
but you don't want to use inclusions.  people who use inclusions are likely
to pack their dry clothes where a broken thermos will do maximum damage.
in space, and in SGML, that's the difference between getting a regular
"welcome home" or your very own entry in the history books.

after having thought long and hard about advantages and disadvantages, I
think it is preferable to have elements whose content is (#PCDATA) and
_nothing_ else.  HyTime talks about pseudo-elements that, among a few other
things, are unnamed elements that contain only one piece of data.  instead
of an element containing data and sub-elements, it contains pseudo-elements
among the sub-elements.  thus, the "mixed content" metaphor breaks down,
and there will be counter-intuitive results, to compound the counter-
intuitive ways that #PCDATA interferes with parsing, especially in the
treatment of whitespace.  the ability to use #PCDATA in a content model is
only a (very) convenient short-hand, and it is sometimes necessary, but
should be viewed as a temporary hack.  (it is so convenient that you will
be taught several ways to abuse it in almost any book on SGML, but most of
these books only tell you what's possible, not what's good practice,
because the SGML community didn't have the experience to know the
difference.  this is changing.)

the greatest obstacle to communication
is the illusion that it has already taken place


Article: 8215 of comp.text.sgml
From: Erik Naggum <>
Newsgroups: comp.text.sgml
Subject: Re: Proposed PDF FIPS report
Date: 18 Mar 1995 04:17:23 UT
Organization: Naggum Software; +47 2295 0313
Message-ID: <>
References: (7916) <entrpoop.793664966@access3>
	    (8048) <3jf9fe$>
	    (8051) <>
	    (8068) <3ji141$>
	    (8109) <>

[M. James Bartley]

|   A PDF file is either a 7-bit ASCII file or a binary file. If it is a
|   7-bit ASCII file only the printable subset of the 7-bit ASCI code plus
|   space, tab and newline (return or linefeed) is used. If it is a binary
|   file, the entire 8-bit range of characters may be used.

[Gary Houston]

|   Er, 7-bit ASCII file _OR_ a binary file???  Doesn't the "bit" in 7-bit
|   imply binary?  This sort of confusion seems to be getting far too
|   common, now that it's even turning up in "reference manuals".

is this really a problem?  in very advanced set-theoretical terms, "ASCII"
is a subset of "binary" files.  it is a very important subset with a number
of special properties not shared by other binary files.  another subset is
"7-bit text", of which "ASCII" is a subset.  yet another subset is "8-bit
text", of which "7-bit text" is a subset.  you may note that "GIF file" is
also a subset of "binary file", and that all files are a subset of the
"finite bit string".  I'm not sure it would be useful or interesting to
call every data object, including programs, data files, operating systems,
computer graphics, WWW, astronomical observations from the space shuttle,
etc, "finite bit strings", even if they are.  I think you're the one to
introduce confusion, Gary.

|   And that non-ASCII option, does the statement "entire 8-bit range of
|   characters may be used" imply that only character sets that fit in 8
|   bits are supported?

no.  these aren't even theoretically interesting questions.

part of the fun with human languages is that we can establish precise
terminology for reference purposes, and then use a much more sloppy
language to get ideas across.  my current .signature is relevant.

the greatest obstacle to communication
is the illusion that it has already taken place


Article: 8216 of comp.text.sgml
From: (Joe English)
Newsgroups: comp.text.sgml
Subject: Re: Help for HyTime
Date: 17 Mar 1995 11:18:01 -0800
Organization: Call Really Late Dialup Internet Access
Message-ID: <3kcn99$>
References: (8079) <3jmr4s$>
	    (8191) <3kango$>

W. Eliot Kimber <> wrote:

>For URL addressing there are essentially two solutions:
>1. Use a URL as the system identifier of the document entity you want to
>   link to. I favor this approach.
>2. Define a notation-specific location address (notloc-form) and use it to
>   contain the URL. Steve DeRose and David Durrand prefer this solution in
>   their HyTime book (my HyTime book perfers the other, as you might expect)

Which of these two solutions is most appropriate depends
on whether one sees the Web as an SGML application that
uses Internet protocols or as an Internet application that
uses SGML as a document format.

I think that, overwhelmingly, the Web is an Internet
application first and an SGML application second.
URLs are not *just* system identifiers for HTML browsers,
though they can be used that way; they are a fundamental
part of Internet applications, SGML-aware or not.
It's not unreasonable to reify them as attributes
or as a HyTime notloc form in HTML, since HTML is
a DTD specifically designed for Internet hypertext.

Panorama (I assume) will be an SGML application first
and an Internet application second.  The first solution--
using URLs as system identifiers-- has clear advantages
in this case.  But for the current generation of Web browsers,
which are not, and show no signs of wanting to be, general-
purpose SGML viewers, first-class URLs in HTML are appropriate.

Of course, the two solutions are not mutually exclusive.
Web browsers can *and should* resolve URLs in system identifiers;
(they *should* understand entity declarations to begin with;
unfortunately most don't).  But URL-valued attributes (<A HREF>,
<FIG SRC>, &c;) shouldn't be removed from HTML or even
deprecated, I feel.

--Joe English


Article: 8217 of comp.text.sgml
Newsgroups: comp.text.sgml
Subject: Re: Editing Massive Files
Message-ID: <>
Organization: MV Communications, Inc.
Date: Sat, 18 Mar 1995 09:04:46 GMT
References: (8119) <3k1njq$>
	    (8187) <3k7mdu$>

Is VEDIT freeware/shareware/commercial?
Where does it live on internet?

In article <3k7mdu$>,
Joe Shubitowski <> wrote:
>In article <3k1njq$>, (Bob Bagwill) writes:
>>P.W.A. Kools ( wrote:
>>: Does anybody out there know of an editor under DOS/Windows which will handle 
>>: large (20Mb plus) SGML files. Word doesn't, although it does allow you to read 
>>: the file. Any replies much appreciated.....
>>If you like (or don't mind :-) vi, DOS elvis stores text in a temporary file,
>>instead of memory.
>>Bob Bagwill <>
>I use VEDIT under DOS.  I have used it on text files as large as 400MB...
>Joe Shubitowski


Article: 8218 of comp.text.sgml
Newsgroups: comp.text.sgml
Subject: Re: Intellitag
Date: Fri, 17 Mar 95 15:38:50 CDT
Organization: EUnet Belgium, Leuven, Belgium
Message-ID: <3kc4q9$>
References: (8143) <3k3gsl$>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Newsreader: NEWTNews & Chameleon -- TCP/IP for MS Windows from NetManage

In article <MARTIN.95Mar9090730@hchworth>, <martin@hchworth> writes:
> Last year there was a brief discussion of the limitations of
> WordPerfect's Intellitag. At the time I didn't keep a copy of the
> conclusions, but now find I need to know them.
> Can somebody please post any experiences of Intellitag, reasons to
> buy, not buy etc? Alternatively mail them to me and I'll post a
> summary.

If you go to the home page of WordPerfect at the WWW, at

the first button can take you to three articles, including one white
paper on WordPerfect SGML Edition, which is going to be the new version
of IntelliTag.

We are working with a (pre-?) beta release now and it looks quite well.
Unfortunately, there is not a single byte of documentation at the moment,
so we are quite ignorant about the possibilities of templates and styles.
Our appreciation of IntelliTag 1.2 will come Real Soon Now !

Jacques Deseyne
Witbakkerstraat 34
B-9051 Sint-Denijs-Westrem, Belgium

E-mail :

"Nani, sed in humeris gigantium" ;-)


Article: 8219 of comp.text.sgml
Newsgroups: comp.text.sgml
Date: Sat, 18 Mar 95 11:43:16 GMT
From: Paul Grosso <>
Message-ID: <>
References: (8190) <3kaft0$>
Subject: Re: PCDATA vs RCDATA

[Joe English]

|   Don't ever use the second form (RCDATA); it's evil.

While Joe's cautions on RCDATA are worth taking careful note of, I wouldn't
go so far as to say the concept is evil (I reserve that comment for CDATA

For DTDs that wish to have a "verbatim" sort of element in which things
such as SGML examples can be placed, I recommend an RCDATA element.  The
key thing to remember is that ETAGO strings (e.g., the "</" string) must be
"escaped" by writing the "<" as a character entity reference.  In fact, I
generally recommend the use of the character entity &lt; in place of "<"
whenever you want a "<" character in your text.  SGML-aware editors will
generally handle this automatically.

RCDATA marked sections are also an option, but are much less likely to be
supported, especially in terms of composed output.  8879-wise, I'll agree
they are techically a good way to get your data parsed correctly, but
practically, I think the careful use of an RCDATA element (as described in
the previous paragraph) is the better way to go.

I will agree with Joe that CDATA elements are very dangerous because aside
from having all the pitfalls Joe mentions about RCDATA, there is no way to
"escape" problematic characters.  That is, even a helpful SGML-aware editor
can't do anything to help when the user enters the "</" string: escaping it
by using an entity reference is not an option.  The best that can be done
is to give an error message to inform the user that the next time the
document is parsed, their CDATA element will surprisingly end sooner than
they think and subsequent parsing errors will be almost guaranteed.


Paul Grosso
VP Research, ArborText, Inc.
Chief Technical Officer, SGML Open




Individual subscription is NOK 70.00 (~= USD 11.10) per volume.
Commercial subscription is NOK 175.00 (~= USD 27.75) per volume.
University subscribers are eligible for discounts.

Individual subscriptions must be paid by an individual, and is intended
for one person, only.  Commercial subscriptions are intended to be shared
among more than one individual.  Aliases and sub-mailing lists must have
commercial subscriptions.

Semi-annual subscriptions cover six volumes at the price of 5.  Annual
subscriptions cover twelve volumes at the price of 10.  Annual and
semi-annual commercial subscriptions may be invoiced upon request.


Checks may be drawn on U.S. banks in U.S. funds, but please, no checks
above NOK 100.  VISA and MasterCard charges accepted.  SWIFT transfer are
accepted for charges above NOK 500.  Please write for details.

Please note that the currency of payment is Norwegian crowns (NOK).  The
NOK is worth approximately USD 0.16.

End of comp.text.sgml digest Vol 8 #26