Subdoc: Why I Demand It, an Update


Title: Re: Subdoc: Why I Demand It, an Update

Author: "W. Eliot Kimber" <eliot@isogen.com>

Date: 23 Jun 1997 16:55:32 GMT


  -----------------------------------------------------------------

Steve Cogorno <cogorno@netcom.com> wrote in article 
> We seriously looked at using SUBDOC to implement reusable SGML chunks. 
> Here's why we didn't.
> 
> 1) Many of the elements in our DTD are recursive.  The formatting depends
> heavily on where in the tree the object appears.  By the very meaning of
> SUBDOC, the processing of the current docuemtn stops, and the entity is
> parsed indepently of the parent.  Therefore, there is no way to format
the
> objects in a flexible, powerful way that takes advantage of the document
> tree.

The statement "there is no way to format" simply isn't true:

1. I wrote code in Perl, using NSGMLS to create a single instance from a
tree  of subdocs.  The result can be formatted normally.  No magic here. 
For the project where we're using SUBDOC (rolled out for production use
today, by the way), we wrote an Omnimark process to do this (we already had
to have a process to produce new instances out of the authoring database to
feed the production processes).

2. JADE 0.8 supports the "sgml-parse" function.  Using this function, you
can     apply a single style sheet to a compound document. [Unfortunately,
there is  a design flaw in DSSSL such that there is no way to look up
through the SUBDOC reference such that elements within the subdoc can see
the context of the reference--James Clark has suggested several possible
solutions to this design flaw, but the problem can be solved with JADE in
the short term.]

3. I wrote a DSSSL spec, using the JADE SGML back end, that does what  my
Perl code does, namely produce a new instance from a compound document. 
This new instance can then be formatted normally.  This spec is available
from the ISOGEN Web site at "http://www.isogen.com/demos/dovalueref.html"
("value reference" being the new HyTime facility that reflects the
semantics of SUBDOC reference, among other things).  This spec can be used
with JADE by piping its output back into a second JADE process:

jade -d dovalueref.dsl -tsgml compound.sgm | jade -d doformatting.dsl -t
rtf 

Takes about twice as long to process, but that means 20 seconds instead of
10 seconds (or 40 instead of 20 for larger documents).

Given this JADE spec, any normal process, including tools like
Framemaker+SGML, Datalogics Composer, and ADEPT*Publisher can be used to
format compound documents at a minimum added cost, both in terms of money
and time.

The dovalueref.dsl spec is completely generic (being insensitive to
document types as long as you've provided the necessary value reference
attributes (samples and instructions provided with the spec) in your DTD.
 
> 2) We currently have an SGML authoring system very much like XML without
a
> DTD.  Authors write in FrameMaker using a standard set of paragraph and
> character tags. We then use a conversion process to bring the Frame docs
> into SGML.  This has been very frustrating for authors because policy,
not
> the tool dictates element usage.  It's hard for them to remember.

Because Frame gives you no easy to way to implement the policy.  This is
not the fault of the policy, it is the fault of the tool.  All SGML is
about is policy: a DTD is nothing more than a set of policy statements
about document construction.  What you need are better tools for defining
and enforcing policies.  Architectures provide much of what you need by
letting you define general policies without constraining the details of
individual documents (see below).

> In our next-generation system, which is built around AdeptEditor, authors
> required that the authoring process be as guided as possible.  They do
not
> want element usage based on policy, as they would have with SUBDOC. 

The SUBDOC-based system we've built for our client using ADEPT*Editor
enforces the rigid SUBDOC use policies completely, making it nearly
impossible for authors to do the wrong thing.  Essentially, when they want
to add to their document, they insert the element type that's allowed by
the content model.  Our code then steps in and controls the process by
which they select or create a new subdocument to ensure that the policy is
enforced.  It couldn't be more guided.  The policy is a simple one (the
document element of the subdocument must be identical to the element type
of the referencing element), it's easy to check, and authors have no
ability to make errors.

And again, *all* SGML usage is based on policy.  The only question is where
the policy is defined and enforced: in the DTD or in the tools.  It is
always the case that there are policies that cannot be expressed or
enforced by DTDs, so there will always be policies that must be enforced by
tools or left to authors to comply with voluntarily.
 
> 3) There is no way to verify that a document is "correct."  For example,
> we cannot use the parser to determine if an author used a <chapter>
inside
> of a <list>.

Again, not true.  It's a simple matter to interrogate the subdoc to see if
it is appropriate for its use context. The easiest way to do this is to
generate a new single instance and validate that.  If the components don't
have compatible DTDs, derive a single architectural instance for the
architecture from which all the components are derived and validate that
[You can't hope to do different-DTD subdoc without using architectures, so
there must be a common architecture from which the components are
derived--remember that architectures provide a means to define DTD-based
policies in a way that allows flexibility at the subdoc level.  The degree
of flexibility allowed is defined by the architectural meta-DTD by making
the architectural content models more or less loose.]  

[I started writing the following.  However, in testing my test case, I
discovered that JADE doesn't behave the way I expected it to with respect
to architectural processing.  I've mentioned this to James to see if my
expectations are the way JADE should behave.]

Again, JADE is [or rather, should be] capable of doing this:

jade -A commonarch -d dovalueref.dsl -tsgml compound.sgm | nsgmls -s -wall
| more

The above command uses dovalueref.dsl to create a new instance, but that
instance representing the combination of the architectural instances for
the architecure "commonarch" (being, for this example, the common
architecture from which all the parts are derived).  This combined
architectural instance is then piped to NSGMLS, which validates the result.
 [I've added a trivial example of doing this to the materials for the
dovalueref DSSSL spec--I won't replicate it here.]

All that said, remember you have the same problem [elements used
inappropriately] anyway, just to a different degree: how does an author
know to use a procedure instead of an ordered list or a list of paragraphs
when all are allowed in a given context?  It's the same problem and has the
same solution: clear definition of policies, author education, and
extra-DTD semantic validation.

> 4) Our documentation is structured very highly; we do not need the
ability
> to include different types of documents in the same SGML document. (The
only
> need that SUBDOC addressed was ID/IDREF issues.)

I think you may have misunderstood my meaning.  The more structured your
documents are (by which I assume you mean "the more specific to the
information task at hand the element types are"), the more you need
subdocuments.  If you are using a single document type, every new
specialized element causes the master DTD to grow.  However, if you use
subdocuments, each component (a procedure, a reference section, a
conceptual bit) can have its own smaller, focused DTD.  

In addition, you may find you want to integrate information components
produced by different groups within your enterprise (perhaps even different
enterprises).  A subdoc-based infrastructure makes this possible in the
future, even if it is not a present requirement (because your authoring and
interchange community is, at the moment relatively small).
 
> The decision to use or not use SUBDOC basically comes down how you want
to
> define policy: you can either make policy around element usage rules or
> around ID assignment. At best, it's 6 to a half dozen which policy you
> decide to implement.  We felt that the disadvantages of SUBDOC far
> outweighed the advantages.

I agree that SUBDOC use (like all SGML implementation decisions) comes down
to questions of how to express and enforce policies.  However, I don't
think it's just about ID assignment (although that's a compelling benefit
of SUBDOC).  As an armchair consultant in this case, it appears to me that
you have underrepresented the benefits and overrepresented the costs of a
subdoc-based approach.  However, as I don't know your detailed
requirements, I can't say that you didn't make the correct decision--you
may very well may have.

-- 
<Address HyTime=bibloc homepage="http://www.drmacro.com">
W. Eliot Kimber, eliot@isogen.com
Senior SGML Consulting Engineer, Highland Consulting
2200 North Lamar Street, Suite 230, Dallas, Texas 75202
+1-214-953-0004 +1-214-953-3152 (fax) 
http://www.isogen.com (work)</Address>
"Rats in the morning, rats in the afternoon...if they don't go away, 
I'll be reducated soon..."   --Austin Lounge Lizards, "1984 Blues" 
(http://www.webcom.com/~yeolde/all/lllhome.html)