[Mirrored from: http://www.textscience.com/w6paper.html. Please see the canonical URL.]
The following paper was presented at the SGML '94 Poster Session. There is an accompaning poster which describes various strategies for mapping Word 6 styles to SGML.
Many organizations are considering using Word 6 for SGML authoring now that Microsoft's SGML Author is on the immediate horizon. We'd like to offer some reflections on our decision to use Word 6 and our experience with conversion to SGML with respect to our overall philosophy of incremental, evolutionary project engineering.
If Word 6 could be used to produce SGML it would be a very attractive choice for our project for at least one negative and one positive reason. The negative reason is that SGML-savvy writers and editors are rare. It is difficult to ask in-house editors, reluctant even to move from Word Perfect to Word, to learn the very different skill of using an SGML editor. It is impossible to require SGML submissions from authors who, if nothing else, can not be expected to have the necessary software. We can, however, ask our staff and authors to use styles which can be turned into basic SGML. As they become adept at this they are also mastering some of the thinking which goes into using SGML. The positive motivation is that we believe that Word is a great document authoring tool, reflecting the effort poured into its development.
We found that Word 6 can be used for authoring SGML through the use of styles. We have described some useful techniques in our poster and we have pointed out some of the limitations and pitfalls. Bottom line: You can, in principle, do most of things you might need to do if you can be flexible in the design of both your stylesheets and DTD to ensure that they work together. Stylesheets can be simplified (lessening the burden on authors) if the conversion scheme includes regular expression recognition, variables, and control statements. We wrote an RTF (Rich Text Format, i.e., Word ASCII) conversion engine using the freeware program Perl, which also executes conversion scripts written in Perl. We have not had the opportunity to try SGML Author yet, but we believe that it will both be useful and not capable of entirely supplanting our use of Perl. The main advantage of SGML Author is its ability to create the SGML file and parse it within Word with a feedback loop for correcting errors. Correcting errors may not always be easy since some errors may make sense according to the DTD but not make sense to the author using styles. But SGML Author is a GUI and is not designed to have the expressive power of a language like Perl, as Microsoft has emphatically stated. We therefore envision using SGML Author in a two stage conversion. The first involves using SGML Author to create SGML conforming to a DTD expressly designed to simply the work of the author with the application of styles and the correcting of errors in conversion. The second stage would be a DTD-to-DTD conversion where the second DTD is derived from the first, but with a fuller breakdown of elements, clearer demarcation of structure, and rearrangement of structure where required. Any of the language-based SGML conversion tools would be very well-suited for this task.
The above scenario assumes that the target DTDs will still be based on a document architecture, that is, the implicit structure of the Word document being converted. A document architecture is less useful than an information repository architecture, based on the division of knowledge of the domain represented, as the source of multiple, non-paper based applications. Nonetheless, today, we are working with document architectures. The spirit of our legacy data haunts the first stage of our project and we are, in consequence, still using models originating out of paper-based production. There are good reasons for going only this far at first, rather than aiming for the ultimate repository architecture in one fell swoop. SGML itself, the standard, tools, and applications, are changing as we speak. Our work in the field of law is deeply interrelated to the flow of information from the courts and legislatures, as well as other legal information providers. We hypothesize that wider acceptance of SGML in our field, the law, and the resulting increased interoperability of electronic legal information resources will have a broad and unifying effect on law databases. Finally, our self-knowledge has grown in the course of targeting and reaching the first stage which better enables us to architect the next stage.
When we move away from a document architecture to a information repository architecture will we have to leave Word by the wayside? After all, Word is designed to create visually appealing documents, not write fields into text databases. There are three arguments militating against the obvious. First, because Word is one of the most widely used text editors, with a plethora of features designed to make editing easy, it is an appropriate choice for writing SGML tagged documents. This is the position taken by Nice Technologies with their SGML Tag Wizard product. The second argument is that as long as human beings write text, visually appealing documents will be the most appropriate vehicle for the expression of ideas and the information repository architecture will forever remain a holy grail. The third argument is that improvements to the structured information capture abilities of WYSIWYG editors will make them viable tools for authoring into information architectures. It is to this third point that we'd like to throw out a few ideas on some things that Microsoft could do with Word and also some things we could do with SGML to make it more WYSIWYG-friendly.
We are very heartened by the appearance of SGML Author and we hope that it will spur creative thinking and debate on the topic of WYSIWYG editing and SGML. We would love to see our investment in Word today pay off when we move our project to the next stage and the next level DTDs. If this hope proves vain we will retool, doing so when we are in a good position to ensure a solid return on our investment in new technology.