SGML: Subdoc: Why I Demand It, an Update

Subdoc: Why I Demand It, an Update


Title: Re: Subdoc: Why I Demand It, an Update
Author: "W. Eliot Kimber" <eliot@isogen.com>
Date: 23 Jun 1997 16:55:32 GMT
----------------------------------------------------------------- Steve Cogorno <cogorno@netcom.com> wrote in article > We seriously looked at using SUBDOC to implement reusable SGML chunks. > Here's why we didn't. > > 1) Many of the elements in our DTD are recursive. The formatting depends > heavily on where in the tree the object appears. By the very meaning of > SUBDOC, the processing of the current docuemtn stops, and the entity is > parsed indepently of the parent. Therefore, there is no way to format the > objects in a flexible, powerful way that takes advantage of the document > tree. The statement "there is no way to format" simply isn't true: 1. I wrote code in Perl, using NSGMLS to create a single instance from a tree of subdocs. The result can be formatted normally. No magic here. For the project where we're using SUBDOC (rolled out for production use today, by the way), we wrote an Omnimark process to do this (we already had to have a process to produce new instances out of the authoring database to feed the production processes). 2. JADE 0.8 supports the "sgml-parse" function. Using this function, you can apply a single style sheet to a compound document. [Unfortunately, there is a design flaw in DSSSL such that there is no way to look up through the SUBDOC reference such that elements within the subdoc can see the context of the reference--James Clark has suggested several possible solutions to this design flaw, but the problem can be solved with JADE in the short term.] 3. I wrote a DSSSL spec, using the JADE SGML back end, that does what my Perl code does, namely produce a new instance from a compound document. This new instance can then be formatted normally. This spec is available from the ISOGEN Web site at "http://www.isogen.com/demos/dovalueref.html" ("value reference" being the new HyTime facility that reflects the semantics of SUBDOC reference, among other things). This spec can be used with JADE by piping its output back into a second JADE process: jade -d dovalueref.dsl -tsgml compound.sgm | jade -d doformatting.dsl -t rtf Takes about twice as long to process, but that means 20 seconds instead of 10 seconds (or 40 instead of 20 for larger documents). Given this JADE spec, any normal process, including tools like Framemaker+SGML, Datalogics Composer, and ADEPT*Publisher can be used to format compound documents at a minimum added cost, both in terms of money and time. The dovalueref.dsl spec is completely generic (being insensitive to document types as long as you've provided the necessary value reference attributes (samples and instructions provided with the spec) in your DTD. > 2) We currently have an SGML authoring system very much like XML without a > DTD. Authors write in FrameMaker using a standard set of paragraph and > character tags. We then use a conversion process to bring the Frame docs > into SGML. This has been very frustrating for authors because policy, not > the tool dictates element usage. It's hard for them to remember. Because Frame gives you no easy to way to implement the policy. This is not the fault of the policy, it is the fault of the tool. All SGML is about is policy: a DTD is nothing more than a set of policy statements about document construction. What you need are better tools for defining and enforcing policies. Architectures provide much of what you need by letting you define general policies without constraining the details of individual documents (see below). > In our next-generation system, which is built around AdeptEditor, authors > required that the authoring process be as guided as possible. They do not > want element usage based on policy, as they would have with SUBDOC. The SUBDOC-based system we've built for our client using ADEPT*Editor enforces the rigid SUBDOC use policies completely, making it nearly impossible for authors to do the wrong thing. Essentially, when they want to add to their document, they insert the element type that's allowed by the content model. Our code then steps in and controls the process by which they select or create a new subdocument to ensure that the policy is enforced. It couldn't be more guided. The policy is a simple one (the document element of the subdocument must be identical to the element type of the referencing element), it's easy to check, and authors have no ability to make errors. And again, *all* SGML usage is based on policy. The only question is where the policy is defined and enforced: in the DTD or in the tools. It is always the case that there are policies that cannot be expressed or enforced by DTDs, so there will always be policies that must be enforced by tools or left to authors to comply with voluntarily. > 3) There is no way to verify that a document is "correct." For example, > we cannot use the parser to determine if an author used a <chapter> inside > of a <list>. Again, not true. It's a simple matter to interrogate the subdoc to see if it is appropriate for its use context. The easiest way to do this is to generate a new single instance and validate that. If the components don't have compatible DTDs, derive a single architectural instance for the architecture from which all the components are derived and validate that [You can't hope to do different-DTD subdoc without using architectures, so there must be a common architecture from which the components are derived--remember that architectures provide a means to define DTD-based policies in a way that allows flexibility at the subdoc level. The degree of flexibility allowed is defined by the architectural meta-DTD by making the architectural content models more or less loose.] [I started writing the following. However, in testing my test case, I discovered that JADE doesn't behave the way I expected it to with respect to architectural processing. I've mentioned this to James to see if my expectations are the way JADE should behave.] Again, JADE is [or rather, should be] capable of doing this: jade -A commonarch -d dovalueref.dsl -tsgml compound.sgm | nsgmls -s -wall | more The above command uses dovalueref.dsl to create a new instance, but that instance representing the combination of the architectural instances for the architecure "commonarch" (being, for this example, the common architecture from which all the parts are derived). This combined architectural instance is then piped to NSGMLS, which validates the result. [I've added a trivial example of doing this to the materials for the dovalueref DSSSL spec--I won't replicate it here.] All that said, remember you have the same problem [elements used inappropriately] anyway, just to a different degree: how does an author know to use a procedure instead of an ordered list or a list of paragraphs when all are allowed in a given context? It's the same problem and has the same solution: clear definition of policies, author education, and extra-DTD semantic validation. > 4) Our documentation is structured very highly; we do not need the ability > to include different types of documents in the same SGML document. (The only > need that SUBDOC addressed was ID/IDREF issues.) I think you may have misunderstood my meaning. The more structured your documents are (by which I assume you mean "the more specific to the information task at hand the element types are"), the more you need subdocuments. If you are using a single document type, every new specialized element causes the master DTD to grow. However, if you use subdocuments, each component (a procedure, a reference section, a conceptual bit) can have its own smaller, focused DTD. In addition, you may find you want to integrate information components produced by different groups within your enterprise (perhaps even different enterprises). A subdoc-based infrastructure makes this possible in the future, even if it is not a present requirement (because your authoring and interchange community is, at the moment relatively small). > The decision to use or not use SUBDOC basically comes down how you want to > define policy: you can either make policy around element usage rules or > around ID assignment. At best, it's 6 to a half dozen which policy you > decide to implement. We felt that the disadvantages of SUBDOC far > outweighed the advantages. I agree that SUBDOC use (like all SGML implementation decisions) comes down to questions of how to express and enforce policies. However, I don't think it's just about ID assignment (although that's a compelling benefit of SUBDOC). As an armchair consultant in this case, it appears to me that you have underrepresented the benefits and overrepresented the costs of a subdoc-based approach. However, as I don't know your detailed requirements, I can't say that you didn't make the correct decision--you may very well may have. -- <Address HyTime=bibloc homepage="http://www.drmacro.com"> W. Eliot Kimber, eliot@isogen.com Senior SGML Consulting Engineer, Highland Consulting 2200 North Lamar Street, Suite 230, Dallas, Texas 75202 +1-214-953-0004 +1-214-953-3152 (fax) http://www.isogen.com (work)</Address> "Rats in the morning, rats in the afternoon...if they don't go away, I'll be reducated soon..." --Austin Lounge Lizards, "1984 Blues" (http://www.webcom.com/~yeolde/all/lllhome.html)