[Mirrored from: http://www.sgmlbelux.be/96/hermans.htm]
A discussion of practical experiences in using SGML and HyTime for publishing to different media (hardcopy, browser, HTML) using software such as Synex ViewPort and FrameMaker+SGML.
Keywords : SGML, HyTime, ViewPort
SD is a social secretariat which provides services and advice with respect to social law to 12,000 Belgian companies. One of its divisions organizes courses in social law. It was this course material that had to be published in a efficient and future-oriented way.
The course material had to be published to different media:
Additional requirements were:
Demonstration of the product as it is now.
It was our deeply felt conviction that the existing course material couldn't be used as it was. It was written in a sequential way, for the traditional book medium. This doesn't work online.
"Don't dump paper online" (William Horton).
What is needed for online use are independent information units which are self-containing, i.e. which give a clear answer to a user's specific question. So we convinced the writers at SD to rewrite the complete course material using such an underlying organization of topics, modules, information units, and micro documents.
After this had be done, we were confident that we now had good source material for an online product. But was it still possible to generate a good old-fashioned book from the same source?
A book needs other information (elements) than an online medium and vice versa. We solved this by including in our sourcefile all elements we thought of as being necessary in both versions. In going to a specific medium (paper or online) we could then strip out the unneeded elements. This meant that the two versions would differ not only in layout but also in content.
We started with three DTDs:
The input DTD was enhanced with ICADD-attributes for generating braille, voice synthesis, ... but those were rapidly removed, because of two reasons:
Result: in a moment of frustration, we dropped the whole ICADD idea.
As you can guess - since a DTD continues to be improved - the maintenance of three separate DTDs quickly became a nightmare. The next phase was the integration of the three DTD's into one single DTD, using marked sections, which resulted in an unbelievable elegant construction. (Note)
We use three kinds of information units:
Note that the content of an introductory topic can be either real content or only a navigational aid. This difference is important since real content is also used in the hardcopy version while navigational aids (only used in the digital version) are stripped away.
Each topic, indicated as such by a fixed attribute, is used in navigating to the next/previous screen (topic) in the digital version based on Synex ViewPort.
Every topic has:
Keywords are not used in the hardcopy version. They are used to generate the keyword-list used in the Windows Help-like topic search function.
Indexes are used in both the digital and the printed version.
We found out that we needed to have an additional mechanism for stripping away elements in the content itself.
We have for example the element doccont (documentcontainer) containing (a reference) to a "digital paper" version (e.g. in Common Ground format) of contracts and other official papers. It is clear that this is aimed at the digital version. In our Omnimark-script for going to hardcopy we indicated to suppress all those "doccont"-elements. This worked fine as long as the doccont element had siblings inside a block. If the doccont was the one and only element inside a block, only the label of the block was printed. In other words you had a title without content.
The solution was to add an attribute to certain elements indicating the target medium of those elements.
The requirement that changes in content compared to the previous version were to be indicated was met by:
Most of the links in the SGML are hardcoded: clinks in HyTime-speak, based on unique identifiers (IDs). I would like to stress however once again the importance of adhering to a modular writing technique. Since every topic is about only one subject, an answer to one user's specific question, you can always link to a complete topic, except for ... three of those clinks, which link to a location ladder for referring to a table.
The linking between the index, which is a separate publication, and the book itself is done by using TEI extended pointers. Those links are in our experience more concise to edit than using the rather elaborated HyTime syntax.
In the Synex ViewPort environment, this is implemented with independent linkfiles (webs) which use location ladders based on the HyTime constructs nameloc, treeloc, and dataloc.
Treelocs and datalocs are good instruments when you have to deal with static, non-changing data. They become quickly unmanageable however when the structure of your data changes frequently, which is the case for Questor. In the best case the user's note still appears inside the topic, albeit at the wrong place. In the worst case the user receives "a bad anchor location"-message. We suspect that this problem will only become more frequent after each update.
For this purpose the nameloc, treeloc, and dataloc mechanism is perfect.
For SGML editing, including TEI extended pointers: Author/Editor of SoftQuad.
For editing independent HyTime-webs: a combination of Panorama Pro of SoftQuad and Windows Notepad.
For the conversions from the DTD for editing to the DTD's for printing and browsing: OmniMark from OmniMark Technologies (formerly Exoterica).
Additional scripts have been written for:
For the moment those files are simply dumped to the file system of the Web server. We are working now on a database-based implementation which will generate HTML on the fly.
Browsing of the digital version is done in a special Questor viewer based on the Synex ViewPort technology, which is in our opinion and based on our experience a very good product. And the Synex people are very responsive and competent people to work with.
The only minor points which need to be mentioned are:
We used Framemaker+SGML as formatting engine, which was at that moment still very new. Note that we didn't do any SGML editing in Framemaker+SGML, but we only used it as a batch formatting engine.
This meant the development of an EDD, Frame's format to describe structure and formatting rules, whose development is guided by its own EDD (very well done). In addition to this we had to add a few read rules to map SGML-elements to special Frame objects such as tables and graphics.
To summarize our experiences, although Framemaker+SGML frustrated us a lot, it is a very promising product: a lot of things are very well-considered. The main problem however is the non-existent technical support.
We are searching for better mechanisms for information reuse at a lower granular level. We have already implemented in the DTD the HyTime conloc-attribute for this aim, but it has not been used until now. We are also evaluating if SGML-repositories can be of any help.
At present the content of Questor consists of general social law. But every sector (e.g. automotive industry) has its own rules and exceptions to general rules. Because of market demand we need to supply this sectorial info.
One possibility is to use marked sections, but since we have in Belgium more than one hundred sectors and for each sector most of the time completely different rules for workers and employees, such a solution would be hardly manageable.
We now plan to make different publications linked to each other by independent HyTime links but we hope to avoid the dynamic content problem by including a special anchor element in our DTD with a required ID, so we only have to use nameloc.
We believe that we have built a product which couldn't have been built without using SGML and HyTime.
SGML helped us in strictly separating content from formatting. The advantage of this approach is that you can use the formatting rules which are best adapted to the characteristics of the final output medium. SGML also helped us in generating the content best suited to the final output medium. Thanks to HyTime we have the uncredible power of independent linking, so we can offer precisely those links that are targeted towards a specific audience.
On the other hand, we are now confronted with severe problems. Those problems are mainly a result of the use of HyTime. HyTime's syntax is a disaster, editing independent HyTime links with the current SGML editing tools is a masochistic activity, and managing those links afterwards quickly becomes a nightmare.
Acknowledgments for help during this integration go to Dan Connolly and Joe English. (Back)