[Archive copy mirrored from the URL: http://www.mulberrytech.com/papers/transfrm.htm; see this canonical version of the document.]
SGML'96 Free SGML Transformation Tools demonstration using SGMLSpm
The criteria for selecting an SGML transformation processing tool are discussed, and the details and SGML-processing features of several free SGML transformation tools are listed. The presentation will conclude with a discussion of good and bad points found in developing a sample application using several of these tools.
Introduction
SGML Tools
SGML Transformation Tools
Factors for Evaluating SGML Transformation Tools
Free or Commercial?
Comfort Factors
Operating System
Scripting Language
Processing Model
Free SGML Transformation Tools
Tool language features
Element Location and Navigation
Sample Application
Components of the Transformation
CoST
SENG
SGMLC
SGMLS.pm
There are many SGML tools and SGML-aware programs available today, and excellent overviews of what is available have been given at this conference this year and for the last several years. There is also a growing number of free SGML software tools available, and a review of free SGML tools is the subject of another paper at this conference. This paper discusses free SGML transformation tools--free software for transforming an SGML instance into something else, be that another SGML instance or a file in some other format.
"SGML tools" in this paper refers to configurable SGML-aware software. The Oxford English Dictionary defines "tool", in part, as an "implement for working on something". The scope of "SGML tools" is software implements for working on SGML. The range of SGML tools covers software for working on DTDs, SGML Declarations, instances or complete SGML documents.
"SGML transformation tools" are, therefore, software implements for transforming SGML documents or instances into something else. Typical examples of "something else" are modified SGML instances, new instances, and the instance's content and/or structure expressed in another file format. These descriptions are necessarily vague because the scope of what can be done with SGML transformation tools is quite large.
There are many interdependent factors to consider when evaluating SGML transformation tools, and, apart from the question of price, these apply to commercial tools as well as to free tools.
One major consideration in evaluating SGML transformation tools, or any tools, is whether to use a free tool or a commercial tool. So far SGML tools have not fallen into the middle ground of shareware. The price, however, is only one factor to consider alongside other factors such as the product's features, scripting language, available operating systems, documentation, and support.
Free tools have the primary advantage that they are free. The reasons why they are free may be that they are really free and are given to you, including source code, provided you abide by their copyright agreement; they are "lite" versions of a fully-featured commercial program; or, as in the case of MetaMorphosis, they are available as free software on one operating system and as commercial software on other operating systems.
The disadvantages of free software are that the documentation may be limited, as may be the support. Much of the free SGML software is provided because of the goodwill of individuals working in their spare time. If so, the ongoing support and improvement of the software is likewise done in their spare time, and is dependent on their continuing to have both the commitment and the spare time to work on the software. Free or "lite" versions of commercial software often have comprehensive documentation, but the support and continued improvement of the software also depends on the goodwill and commitment of the vendor, and it is unrealistic to expect that all vendors will support their free software to the same extent as they support their commercial software. In practice, SGML users have been well served by a range of free SGML software with good support from their volunteer authors.
Not surprisingly, the advantages and disadvantages of commercial software are almost exactly the reverse of those of free software. The advantages of commercial software are that it is usually well documented and that the software vendor provides technical support. The disadvantages of commercial software is that it costs money to buy and, in many cases, it costs money for ongoing technical support after an initial period and for upgrades.
When selecting free software, price is no object, so the selection can be based solely on the features of the available software. However, a factor to consider other than the technical merits of a piece of software is how well it fits into your existing pattern of work. If, in your work, you use a single operating system or a single scripting language or you exclusively process documents in batches, then a tool that does 95% of what you want and that fits in well with the rest of your system may be a better choice than, for example, a tool that does more than you would ever need but also runs on an operating system you'd otherwise never buy. The software may be free, but it still takes time to setup and use, and whether you're using it in your day job or in your leisure time, time is a valuable commodity.
The operating system or systems that you currently use will narrow your selection of free tools since, as noted previously, you are unlikely to buy a new operating system just to run a free tool. There is, however, at least one free SGML transformation tool available for each of MacOS, DOS, Windows 3.x, Windows 95/NT, Linux, and Unix in general.
A tool's scripting language, can be a consideration. If you have experience with a language, and have a body of existing code in that language, then your best choice may be a tool that uses that same language, if such a tool is available.
If your present arrangement doesn't pull you towards a particular scripting language, examine the tools available to you, firstly to see that the SGML-specific functions cover all of the events you are interested in--for example, some tools do not treat record ends as SGML events, but that is not going to be significant for all applications--and secondly to see how well the scripting language suits your purposes. The advantage of an SGML transformation tool over using a general-purpose scripting language such as unaugmented Perl or Tcl is that you don't have to write the functions for identifying element start and ends, keeping track of where you are in the document, etc., but the scripting language of the transformation tool should make it easy to perform actions based upon the current position in the document, etc., and it should provide easy-to-use functions for operations such as file I/O and regular expression matching.
Above all else, the scripting language should let you write programs that you can maintain so that you can come back to a program six months after you last touched it and understand how it works. All of the languages for the free SGML translation tools allow you to insert comments, but the syntax of the language should, if you put some effort into writing maintainable code, make for programs you can maintain and reuse.
Another factor to consider is the processing model used by the transformation tool. All of the tools being considered work on parsed documents, and they build their representation of the document from the information received from the parser. The models used for representing the documents can be divided into event-driven and tree-based.
In the event-driven model, processing begins at the beginning of the document and proceeds linearly to the end of the document. The transformation tool responds to the "landmarks"--element starts and ends, external entity references, processing instructions, etc.--it sees as the processing goes from start to finish, and it keeps track of what it has seen so portions of your script can execute actions at these landmark events, including actions dependent on previous events. You can, for example, perform different processing for paragraphs inside tables and paragraphs inside lists because the tool tracks what elements are open at any point. You can do some conditional processing based upon "sibling" elements since you can often tell if an element is the first element within its containing element, but you cannot tell if you are the last element because with this processing model you cannot look ahead to see what is coming next.
In the tree-based model, the transformation tool also traverses the document from start to end, but it builds a map of the landmark events as it goes, and when it is complete, the information in the map is available to your script. Because the structure of the SGML document is a base element containing other elements that themselves contain elements until the elements are either EMPTY or they contain data characters, the map of the document looks like a tree, with the base element as its root, the successive levels of contained elements being branches from that root, and the leaves of the tree being the EMPTY elements and the data characters. Because this information is available when your script is executing, you are able to do conditional processing based upon elements that come after the current element as well as what came before, and you can search the entire tree for elements matching a criteria.
A hybrid approach used by tools such as CoST allows sequential traversal much like the event-driven model but starting from any point in the tree and working "downwards" while maintaining the ability to make queries about any part of the tree.
The remainder of this paper lists properties of several of the available free SGML transformation tools. The coverage is limited to transformation tools that parse an SGML instance and, using the information available from the parser, provide a mechanism to transform and manipulate the instance to produce output in another form, which could be a modified SGML instance or some portion of the instance's content expressed in another file format.
Information on the range of free SGML transformation tools can be found as part of Robin Cover's SGML bibliography at http://www.sil.org/sgml/publicSW.html and in Steve Pepper's "Whirlwind Guide to SGML Tools" at http://www.falch.no/~pepper/sgmltool/.
The rows in the table are:
Version | The current version of the tool. |
Operating system | The operating system or systems for which the tool is available. |
Source or executable | Whether the tool is available as source code, as an executable file, or as both source code and an executable. SGMLS.pm, for example, is available only as source code because it is a Perl script, other tools are available only as executables because the tool may be free but technology that developed the tools is not, and some are available as both source and executable because the source is freely distributable but the executables are provided for common platforms so you can get started without having to compile the source. |
Scripting language | The language on which the transformation scripts are based. Most of these tools provide specialized functions or language constructs for manipulating the SGML built upon the facilities of a general-purpose programming language. |
Processing model | Whether the tool uses an event-driven or a tree-traversal model for processing the SGML instance. As explained previously, in the event-driven model, the information from the parser is processed sequentially, and the transformation tool responds to events in the SGML stream; and in the tree-traversal model, the information from the parser is used to construct a representation of the instance, and the program has the full information available to it when processing the transformation script. |
Command line or GUI | Whether the tool runs from the command line, from a graphical user interface (GUI), or from either. Invoking tools from the command line may be more cryptic than using a GUI, but tools with a command line interface are easier to incorporate into scripts, Makefiles, and batch processing systems. |
Tool | CoST | MetaMorphosis | SENG | SGMLC | SGMLS.pm | STIL | TclYasp |
---|---|---|---|---|---|---|---|
Version | 2.0 | 2.1 | 1B | 1.0.a | 1.03 | 1.0 | 1.0 |
Operating system | DOS, Windows 3.1, Windows 95, Windows NT, Unix | Linux (Commercial version available for Windows and other Unix) | Windows 95, Windows NT, Solaris | Windows 3.1, Windows 95, Windows NT | DOS, MacOS, Windows 3.1, Windows 95, Windows NT, Unix | DOS, Unix | MacOS, Unix |
Source or executable | Both | Executable | Executable | Executable | Source | Source | Both |
Scripting language | Tcl | Proprietary | Scheme | C | Perl | CLISP | Tcl |
Processing model | Tree and event-driven | Tree | Event | Event | Event | Event | Event |
Command line or GUI | Command line | Command line | Command line | GUI | Command line | Command line | Command line (GUI under MacOS) |
A summary of the SGML "events" handled by the transformation tool languages is shown in the following table.
The rows in the table are:
Element | The tool script has a single rule for processing a (possibly qualified) GI. Note that within the rule, the tool language may allow or require specification of separate actions for the start and the end tags of the element, in which case the tool also has a check () in the following row. |
Separate element start and end | The tool language allows or requires separate specification of actions for the start and end tags of an element. |
Default element rule | The tool allows specification of a default element rule that is processed when no other rules apply. |
Attribute | The tool language allows specification of processing of attributes or based upon attribute values. |
SDATA entity | The tool language allows specification of processing of SDATA entities, either processing based upon the entity name or processing based upon the SDATA replacement text. |
Non-SGML external entity | The tool language allows specification of processing of non-SGML external entities. This may include resolving of public identifiers to system identifiers. |
Document root or start/end | The tool language allows specification of actions for either the document root (for tree-based processing) or the start and end of the document (for event-driven processing). |
Subdoc | The tool language allows specification for actions to be taken when a subdocument is part of the SGML instance. |
Processing instruction | The tool language allows specification of actions based upon processing instructions in the SGML instance. |
Record end |
The tool handles record ends as either separate events or separate tree-nodes, in contrast to tools where record ends are included without special processing as part of the character data of an element. The tool handles record ends as either separate events or separate tree-nodes, in contrast to tools where record ends are included without special processing as part of the character data of an element. |
Notation | The tool language allows specification of processing based upon the notation of an element or entity. |
Character data | The tool language allows specification of processing for the character data content of elements, in contrast to tools where the character data is implicitly handled as part of the element processing. |
Tool | CoST | MetaMorphosis | SENG | SGMLC | SGMLS.pm | STIL | TclYasp |
---|---|---|---|---|---|---|---|
Element | |||||||
Separate element start and end | |||||||
Default element rule | |||||||
Attribute | |||||||
SDATA entity | |||||||
Non-SGML external entity | |||||||
Document root or start/end | |||||||
Subdoc | |||||||
Processing instruction | |||||||
Record end | |||||||
Notation | |||||||
Character Data |
A summary of the facilities provided by the transformation tools for locating an element within its context and for locating other elements relative to the "active" element is shown in the following table.
The rows in the table are:
Parent | The tool is able to return the GI of the parent element, to navigate to the parent element, or to match a given GI to the parent element when qualifying element rules. |
Ancestor | The tool is able to return the GIs of ancestors of the current element, to navigate to a specified ancestor element, or to match a given GI to an ancestor element when qualifying element rules. |
Children | The tool is able to return the GIs of children of the current element, to navigate to the child elements, or to match a given GI to a child element when qualifying element rules. |
Left sibling | The tool is able to return the GIs of left siblings of the current element, to navigate to the left sibling elements, to count the number of left siblings, or to match a given GI to a left sibling element when qualifying element rules. |
Right sibling | The tool is able to return the GIs of right siblings of the current element, to navigate to the right sibling elements, to count the number of right siblings, or to match a given GI to a right sibling element when qualifying element rules. |
Tool | CoST | MetaMorphosis | SENG | SGMLC | SGMLS.pm | STIL | TclYasp |
---|---|---|---|---|---|---|---|
Parent | |||||||
Ancestor | |||||||
Children | |||||||
Left sibling | |||||||
Right sibling |
A sample application of converting SGML'96 papers marked up using the GCAPAPER DTD to HTML has been completed for the following tools:
All of the tools are capable of straight translation of tag names, so this discussion will concentrate on six aspects of the transformation that are more than simple one-to-one translations.
The main title of the SGML'96 paper appears twice in the HTML document: once in the <title> element within the <head> element so the title appears in the banner of the window, and once as the contents of a <h1> element at the start of the displayed document. For example:
<GCAPAPER> <FRONT> <TITLE>Free SGML Transformation Tools</TITLE>would be transformed into:
<html> <head><title>Free SGML Transformation Tools</title></head> <body bgcolor="#FFFFFF"> ... <h1><a name="top">Free SGML Transformation Tools</a></h1></p>
In the SGML, the authors' names and contact information appears immediately after the main title but the authors' biographies, with another copy of the authors' names, appears within the <biography> element after the abstract. In the HTML, the authors' names, titles, and affiliations appear after the main title, and they are linked to the authors' names, full contact information, and biographies appearing at the end of the document. This transformation, therefore, requires inserting part of an author's information at the head of the document, saving all of the information for output at the end, locating the biography for the author, and outputting the biography with the rest of the author's information. In addition, a link is made between the minimal author information at the beginning of the document and the full information at the end. For example, this author information and biography:
<AUTHOR PRIME="1"> <FNAME>Tony</FNAME><SURNAME>Graham</SURNAME> <TITLE>Consultant</TITLE> <ADDRESS> <AFFIL>Mulberry Technologies, Inc.</AFFIL> ... </ADDRESS> </AUTHOR> <ABSTRACT>...</ABSTRACT> <BIOGRAPHY> <BIO> <FNAME>Tony</FNAME><SURNAME>Graham</SURNAME> <PARA>Tony Graham has ...</PARA> </BIO> </BIOGRAPHY>is transformed to at the start of the document:
<address> <b><a href="#Tony-Graham">Tony Graham</a></b>, Consultant, Mulberry Technologies, Inc. </address>and at the end of the document as:
<h4>Author Information</h4> <p><b><a name="Tony-Graham">Tony Graham</a></b>, Consultant, Mulberry Technologies, Inc.<br> 6010 Executive Boulevard, Suite 608, Rockville MD 20852 U.S.A.<br> Phone: (301) 231 6930, E-mail: info@mulberrytech.com, WWW: http://www.mulberrytech.com</p> <p>Tony Graham has been ...<p>
As a navigational aid, the <section> and <subsec1> titles are duplicated and gathered in a table of contents at the head of the document. Each entry in the table of contents is linked to the occurrence of the title in the body of the document, and each occurrence is linked to the copy in the table of contents. The user can, therefore, easily jump from any <section> or <subsec1> title to the table of contents, and jump from the table of contents to any <section> or <subsec1> within the document. For example, this <section> title:
<SECTION> <TITLE>Introduction</TITLE>is transformed to a <h2> element at the corresponding point in the HTML document:
<h2><a name="sec1" href="#tocsec1">Introduction</a></h2>and it is duplicated as an entry in the table of contents:
<b><a name="tocsec1" href="#sec1">Introduction</a></b><br>
A couple of paragraphs of legalese are read from an external file and inserted at the end of the HTML document. This exercises the transformation tool's interaction with the external system.
This is the conversion of the pre-formatted text element in the GCA DTD to the pre-formatted text element in HTML, and is a simple translation of tag names, but the tools varied in their ability to retain the record ends within the <VERBATIM> elements. In addition, some SGML'96 papers use RCDATA marked sections within <VERBATIM> elements for blocks of SGML, and tools also varied in their handling of characters such as "<" and "&" occuring in the marked sections.
The <GRAPHIC> element in the GCA DTD identifies the external graphic file using an external entity, but the HTML <img> element identifies the external graphic file by a direct reference. In addition, the GCA DTD supports a limited set of graphic formats that does not include the GIF or JPEG formats in common use on the World Wide Web. The transformation script attempts extracts the file reference from the declaration for the external entity then attempts to find a file with the same basename and either a .gif or .jpg extension.
This was comparatively simple. The script normally outputs character data in the document, which accounts for one instance of the title, and CoST has a [content] operator that returns the contents of the current (or specified) element, which accounts for the other instance of the title.
The applicable code is shown below. One instance of the title is output as part of the start tag processing, and the other is handled by other rules that match the character data occuring between the title start and end tags.
{element TITLE in FRONT} { START { output "<head><title>" output [content] output "</title></head>" nl output "<body bgcolor=\"#FFFFFF\">" nl output "<p><small><a href=\"#disclaimer\">SGML'96 Free SGML Transformation Tools</a> demonstration using CoST</small></p>" nl output "<h1><img src=\"$gFigDir/dots.gif\" align=\"bottom\"><a name=\"top\">" } END { output "</a></h1>" nl }
Output of character data was suppressed during the AUTHOR element:
{textnode within AUTHOR} { CDATA {} SDATA {} RE {} }and the author information was found using CoST's query mechanism and output as part of the AUTHOR start tag processing:
{element AUTHOR} { START { set Firstname [join [query* child element FNAME subtree textnode content]] set Surname [join [query* child element SURNAME subtree textnode content]] set Title [join [query* child element TITLE subtree textnode content]] set Affil [join [query* child element ADDRESS child element AFFIL subtree textnode content]] regsub -all { } $Firstname {-} NewFirstname regsub -all { } $Surname {-} NewSurname setprop AuthorID "$NewFirstname-$NewSurname" output "<address>" if {[query? withattval PRIME 1]} { output "<b>" } output "<a href=\"#$NewFirstname-$NewSurname\">" output "$Firstname $Surname" output "</a>" if {[query? withattval PRIME 1]} { output "</b>" } if {$Title != ""} { output ", $Title" } output ", $Affil</address>" nl } }
Inserting the full author information at the end of the document was done with a query in the GCAPAPER end tag procesing:
withNode docroot child element GCAPAPER child element FRONT child element BIOGRAPHY {process AuthorInfo}The query, although executed in the GCAPAPER, located the BIOGRAPHY element and executed the rules in the "AuthorInfo" specification on the subtree beginning at BIOGRAPHY. Those rules included queries that located the author information, executed another specification, and the rules in that specification inserted the author information in the biography information inserted at the end of the paper. This sounds complex, but it was accomplished with queries and rules rather than needing to storing the author and biography information in programmer-defined data structures and regurgitate it at the appropriate point.
To insert the table of contents after the abstract, a query located the document root and executed the rules in the "MakeToC" specification:
{element ABSTRACT} { END { output "<p>" withNode docroot {process MakeToC} output "</p>" nl } }The rules in the "MakeToC" specification matched only the titles in the SECTION and SUBSEC1 so only the titles are output as the table of contents.
specification MakeToC { {element TITLE in SECTION} { START { incr SectionNum setprop SectionNum $SectionNum output "<b><a name=\"tocsec$SectionNum\" href=\"#sec$SectionNum\">" } END { output "</a></b><br>" nl } } {element TITLE in SUBSEC1} { START { incr SectionNum setprop SectionNum $SectionNum output "   <a name=\"tocsec$SectionNum\" href=\"#sec$SectionNum\">" } END { output "</a><br>" nl } } {textnode within TITLE in SECTION} { CDATA { output [ sgmlescape [content]] } SDATA { output [ EntityMap [content]] } RE {nl} } {textnode within TITLE in SUBSEC1} { CDATA { output [ sgmlescape [content]] } SDATA { output [ EntityMap [content]] } RE {nl} } }
At the appropriate point, the disclaimer file is opened then echoed to the output line by line:
set Disclaimer [open "$gFigDir/disclaim.htm"] while {[gets $Disclaimer Line] >= 0} { output $Line nl }
This is a straightforward tag translation.
A query locates the system identifier of the entity referenced by the GRAPHIC element, and the Tcl file rootname operator returns the system identifier without its extension. Using the Tcl file exists operator, the program tests for existence of equivalent GIF and JPEG files and inserts the first of these that it identifies as the file referenced by the img element in the HTML.
{element GRAPHIC} { START { global gFigDir set Rootname [file rootname [query entity [query attval FIGNAME] sysid]] output "<br>" nl output "<img align=\"middle\"" if {[file exists "$gFigDir/$Rootname.gif"]} { output " src=\"$gFigDir/$Rootname.gif\"" } elseif {[file exists "$gFigDir/$Rootname.jpg"]} { output " src=\"$gFigDir/$Rootname.gif\"" } output ">" nl } }
SENG has only ever been beta software. Because it was never really finished, it does not support processing of entities, and because it doesn't support entities, the SENG version of the transformation script was never finished.
Copernican Solutions is currently working on a DSSSL engine written in Scheme on top of Java, and while it will also have the name SENG, it will do far more than the current SENG.
SENG does not give direct access an element's character data; the only access is by declaring in the start tag a variable to contain the character data, then doing something with the variable in the end tag processing. Element rules in SENG cannot be qualified by ancestor elements, so a large part of the code for some element's start and end tags is cond statements testing string comparisons against the element's parent's GI to determine what action to take.
The code for the <title> element, therefore, is as follows:
<TITLE> { (define title (string)) (let ((parent (cadr context))) (cond ((string=? parent "SECTION") (display "<h2><a name=\"sec%d(SECTION-COUNT)\" href=\"#tocsec%d(SECTION-COUNT)\">")) ((string=? parent "SUBSEC1") (display "<h3><a name=\"sec%d(SECTION-COUNT)\" href=\"#tocsec%d(SECTION-COUNT)\">")) ((string=? parent "SUBSEC2") (display "<h4>")) ((string=? parent "SUBSEC3") (display "<h5>")) ((string=? parent "AUTHOR") (display ", ")))) } </TITLE> { (let ((parent (parent))) (cond ((string=? parent "FRONT") (begin (display "<head><title>") (display title) (display "</title></head>") (newline) (display "<body bgcolor=\"#FFFFFF\">") ;;; Output title again next to fancy image (display "<p><small><a href=\"#disclaimer\">SGML'96 Free SGML Transformation Tools</a> demonstration using Seng</small></p>") (newline) (display "<h1><img src=\"../image/dots.gif\" align=\"bottom\"><a name=\"top\">") (display title) (display "</a></h1>") (newline))) ((string=? parent "SECTION") (begin (display title) (display "</a></h2>"))) ((string=? parent "SUBSEC1") (begin (display title) (display "</a></h3>"))) ((string=? parent "SUBSEC2") (begin (display title) (display "</h4>"))) ((string=? parent "SUBSEC3") (begin (display title) (display "</h5>"))) ((string=? parent "AUTHOR") (begin (display title))))) (newline) }
Outputting the title twice is, therefore, comparatively simple since the title text had to be saved to a string: the string is just output (displayed) twice.
This was not implemented.
This was not implemented.
This was not implemented.
This was started, but because SENG does not process entities, it "swallowed" all of the < entities in the <VERBATIM> elements, and the effort was discontinued.
This was not implemented since SENG does not support entities.
Like SENG, SGMLC does not give direct access to an element's character data content, but instead it is necessary to declare a buffer then redirect the SGMLC output to that buffer while the element's content is being processed. This is quite a powerful feature when output redirections are nested, but it requires the script writer to explicitly declare, redirect output to, end the redirection, and close each buffer, and it is not always possible to write element rules without knowing what buffers are currently open, so consequently a change in the buffers may require a change to multiple element rules.
Duplicating the title, therefore, required redirecting the output to a buffer then closing the buffer and outputting the buffer contents twice:
element TITLE when (elt(1) == "FRONT") start { gTitleBufferPtr = open("gTitleBuffer", 2, "w"); if (gTitleBufferPtr == "-1") { message("Couldn't open global variable gTitleBuffer", 0); } gPreviousStream = redirect(0, gTitleBufferPtr); } end { redirect(0, gPreviousStream); // Close the global variable that now contains the title close(gTitleBufferPtr); 0 << "<head><title>" << gTitleBuffer << "</title></head>\n" << "<body bgcolor=\"#FFFFFF\">\n"; 0 << "<p><small><a href=\"#disclaimer\">SGML'96 Free SGML Transformation Tools</a> demonstration using SGMLC-Lite</small></p>\n"; 0 << "<h1><img src=\"../image/dots.gif\" align=\"bottom\"><a name=\"top\">"; 0 << gTitleBuffer; 0 << "</a></h1>\n"; }
As part of the <AUTHOR> element processing, a buffer was opened in the <AUTHOR> start tag and closed in the <AUTHOR> end tag processing, and the contents of the buffer was attached to the <FRONT> tag using the SGMLC map construct keyed to the author's first name and surname. The map construct behaves like psuedo-attributes or an associative array attached to an element. A map comprises pairs of a key and a value.
element AUTHOR start { 0 << "<address>"; // Open a global variable for all the author details we will collect gAuthorPtr = open("gAuthorDetails", 2, "w"); if (gAuthorPtr == "-1") { message("Couldn't open global variable gAuthorDetails", 0); } // Put a paragraph start tag to the author details gAuthorPtr << "<p>"; } end { 0 << "</address>\n"; // End the paragraph of the collected author details gAuthorPtr << "</p>\n"; // Close the author details in case we need to close it before we use // it, and so the next open can wipe out the current contents. close(gAuthorPtr); // Put the author details we collected into a map on the FRONT element // since it's also an ancestor of BIO, which is where we use the // author information. setatt(anc("FRONT"), "AuthorDetails", printf("%s-%s", gFirstName, gSurname), gAuthorDetails); } element FNAME when (elt(1) == "AUTHOR") start { gFirstNamePtr = open("gFirstName", 2, "w"); if (gFirstNamePtr == "-1") { message("Couldn't open global variable gFirstName", 0); } gPreviousStream = redirect(0, gFirstNamePtr); } end { redirect(0, gPreviousStream); // Close the global variable that now contains the first name close(gFirstNamePtr); } element SURNAME when (elt(1) == "AUTHOR") start { gSurnamePtr = open("gSurname", 2, "w"); if (gSurnamePtr == "-1") { message("Couldn't open global variable gSurname", 0); } gPreviousStream = redirect(0, gSurnamePtr); } end { redirect(0, gPreviousStream); // Close the global variable that now contains the surname close(gSurnamePtr); if (exists(anc("AUTHOR"), "PRIME") && att(anc("AUTHOR"), "PRIME") == "1") { 0 << "<b>"; } 0 << "<a href=\"#" << gFirstName << "-" << gSurname << "\">"; 0 << gFirstName << " " << gSurname << "</a>"; if (exists(anc("AUTHOR"), "PRIME") && att(anc("AUTHOR"), "PRIME") == "1") { 0 << "</b>"; } // Do it all again because we're going to save the author details // and output them when we get to the biography. gAuthorPtr << "<b><a name=\"" << gFirstName << "-" << gSurname << "\">"; gAuthorPtr << gFirstName << " " << gSurname; gAuthorPtr << "</a></b>"; }
The biographical information is redirected to a buffer that will only be output as part of the <GCAPAPER> end tag processing, but as part of processing the biographies, the author information saved in a map attached to the <FRONT> element is also directed to the buffer.
element BIOGRAPHY start { gBiographyPtr = open("gBiography", 2, "w"); if (gBiographyPtr == "-1") { message("Couldn't open global variable gBiography", 0); } gPreBiographyStream = redirect(0, gBiographyPtr); 0 << "<hr noshade>\n"; 0 << "<h4>Author Information</h4>\n"; } end { // Sort what's left in the AuthorDetails map if (maplen(anc("FRONT"), "AuthorDetails") > 0) { sort(anc("FRONT"), "AuthorDetails"); while(MapCount < maplen(anc("FRONT"), "AuthorDetails")) { 0 << att(anc("FRONT"), "AuthorDetails", key(anc("FRONT"), "AuthorDetails", MapCount++)); 0 << "<hr noshade width=\"30%\" align=\"center\">\n"; } } redirect(0, gPreBiographyStream); // Close the global variable that now contains the biographical info close(gBiographyPtr); }
The character data in the titles of <SECTION> and <SUBSEC1> elements are captured using the familiar technique of redirecting the output to a buffer then, because the output for the body of the document is being redirected to a buffer, the text for the Table of Contents entry is output directly and the text for the title in the body of the document is output to the buffer. The buffer containing the output for the body of the document is written to the output as part of the <GCAPAPER> end tag processing so the table of contents is output before the text in the body of the document.
element TITLE when (elt(1) == "SECTION") start { gTitleBufferPtr = open("gTitleBuffer", 2, "w"); if (gTitleBufferPtr == "-1") { message("Couldn't open global variable gTitleBuffer", 0); } gPreviousStream = redirect(0, gTitleBufferPtr); } end { redirect(0, gPreviousStream); // Close the global variable that now contains the title close(gTitleBufferPtr); 0 << "<h2><a name=\"sec" << gSectionCount << "\" href=\"#tocsec" << gSectionCount << "\">" << gTitleBuffer << "</a></h2>\n"; if (gSectionCount > 1) { gRealPtr << "<br>\n"; } gRealPtr << "<b><a name=\"tocsec" << gSectionCount << "\" href=\"#sec" << gSectionCount << "\">" << gTitleBuffer << "</a></b>"; gSectionCount++; }
The disclaimer file is opened and output line-by-line. SGMLC does not recognize the end of file condition, so the only way to stop the loop outputting the disclaimer lines is to break out of the loop if the line matches an empty string.
disclaimer = open("c:\\projects\\transfrm\\data\\disclaim.htm", 0, "r"); if (disclaimer == "-1") { message("Couldn't open disclaimer.htm", 0); } while (1) { disclaimer >> line; if (line == "") { break; } 0 << line; } close(disclaimer);
This is a simple translation of the tags. Unfortunately, in an entirely undocumented transformation, SGMLC translated each of the "<" characters in the RCDATA marked sections in the <VERBATIM> into "<<" so every "<" character had to be changed to "<".
element VERBATIM start { 0 << "<pre>"; } end { 0 << "</pre>\n"; }
Extracting the basename of the graphic entity's filename is comparatively simple, but the only way to test for a file's existence was to attempt to open it.
element GRAPHIC start { FigName = att(0, "FIGNAME"); Basename = tok(FigName, "."); 0 << "<br>\n<img align=\"middle\""; FilePtr = open(printf("..\\data\\%s.gif", Basename), 0, "r"); if (FilePtr != "-1") { close(FilePtr); 0 << " src=\"" << printf("../image/%s.gif", Basename) << "\""; } else { FilePtr = open(printf("..\\data\\%s.jpg", Basename), 0, "r"); if (FilePtr != "-1") { close(FilePtr); 0 << " src=\"" << printf("../image/%s.jpg", Basename) << "\""; } } 0 << "><br>\n"; }
Like SENG, it is not possible to qualify an element rule in SGMLS.pm, so a large part of the processing for some elements is a succession of if statements testing which element contains the current element. Because the test tests only that the current element is contained by another element, the order of the if statements is important, so the test for containment by <SECTION> comes before the test for containment by <SUBSEC1>, etc. For processing the main title, the operation in the start tag saves the contents of the <TITLE> element to a string, which the end tag processing pops from the output stack as the local variable $lTitle. $lTitle is used in multiple output statements to generate the duplicate titles:
sgml('<TITLE>', sub { my $lElement = shift; if ($lElement->in(FRONT)) { push_output 'string'; } elsif ($lElement->in(SECTION)) { push_output 'string'; } elsif ($lElement->in(SUBSEC1)) { push_output 'string'; } elsif ($lElement->in(SUBSEC2)) { output "<h4>\n"; } elsif ($lElement->in(SUBSEC3)) { output "<h5>\n"; } elsif ($lElement->in(AUTHOR)) { push_output 'string'; } }); sgml('</TITLE>', sub { my $lElement = shift; if ($lElement->in(FRONT)) { my $lTitle = pop_output; output "<head><title>", $lTitle, "</title></head>\n"; output "<body bgcolor=\"#FFFFFF\">\n"; # Output title again next to fancy image output "<p><small><a href=\"#disclaimer\">SGML'96 Free SGML Transformation Tools</a> demonstration using SGMLSpm</small></p>\n"; output "<h1><img src=\"../image/dots.gif\" align=\"bottom\"><a name=\"top\">", $lTitle, "</a></h1>\n"; } elsif ($lElement->in(SECTION)) { my $lTitle = pop_output; my $lCurrentToC = $Refs->get('ToC'); output "<h2><a name=\"sec", $gSectionCount, "\" href=\"#tocsec", $gSectionCount, "\">", $lTitle, "</a></h2>\n";
This uses the SGMLS.pm Refs package to save information in bhe two passes of processing. In the first pass, the information is put to the references manager, and in the second pass, get operations get information from the reference manager.
For each of the elements within the <AUTHOR> element, the element's contents is appended to a scalar variable $gAuthorInfo, and as part of the <AUTHR> end tag processing, $gAuthorInfo is saved as a reference keyed on the author's name.
sgml('<AUTHOR>', sub { # Reset per-author variables. $gAuthorInfo = ''; $gAuthorId = ''; $gBioInfo = ''; output "<address>"; }); sgml('</AUTHOR>', sub { # Output the string of author information that we've been collecting output "</address>\n"; # Save the full author info for the biography if ($Refs->get($gAuthorId) eq '') { $Refs->put($gAuthorId, $gBioInfo); } });
For each author, the firstname and surname are concatenated and used as the key to get the corresponding author information from the references system.
sgml('</SURNAME>', sub { my $lElement = shift; my $lSurname = pop_output; if ($lElement->in(AUTHOR)) { $gAuthorInfo .= " " . $lSurname; $gAuthorId .= "-" . $lSurname; $gBioInfo .= " " . $lSurname; # Trim $gAuthorId and replace any spaces with underscores $gAuthorId =~ s/^\s+//g; $gAuthorId =~ s/\s+$//g; $gAuthorId =~ s/\s+/-/g; # Surround the bio info with an anchor $gBioInfo = "<b><a name=\"" . $gAuthorId . "\">" . $gAuthorInfo . "</a></b>"; # Now output the author info surrounded by another anchor output "<a href=\"#", $gAuthorId, "\">", $gAuthorInfo, "</a>"; } elsif ($lElement->in(BIO)) { $gAuthorId .= "-" . $lSurname; # Trim $gAuthorId and replace any spaces with underscores $gAuthorId =~ s/^\s+//g; $gAuthorId =~ s/\s+$//g; $gAuthorId =~ s/\s+/-/g; if ($Refs->get($gAuthorId) ne '') { output "<p>", $Refs->get($gAuthorId), "</p>\n"; } } });
Generating the Table of Contents again used the References manager. During the first pass of the program, the text for the Table of Contents is built up as each title is processed and saved as a reference:
} elsif ($lElement->in(SECTION)) { my $lTitle = pop_output; my $lCurrentToC = $Refs->get('ToC'); output "<h2><a name=\"sec", $gSectionCount, "\" href=\"#tocsec", $gSectionCount, "\">", $lTitle, "</a></h2>\n"; if ($Refs->get("sec" . $gSectionCount) eq '') { if ($lCurrentToC ne '') { $lCurrentToC .= "<br>\n"; } $lCurrentToC .= "<b><a name=\"tocsec" . $gSectionCount . "\" href=\"#sec" . $gSectionCount . "\">" . $lTitle . "</a></b>"; $Refs->put('ToC', $lCurrentToC); $Refs->put('sec' . $gSectionCount, $lTitle); } $gSectionCount++;then in the second pass, the saved table of contents is output after the abstract:
sgml('</ABSTRACT>', sub { # Output a paragraph containing the Table of Contents. # The ToC should be empty the first time we run this, but we # build the ToC during the first run, so the second time this # script is run, we output the complete Table of Contents in the # right place. output "<p>"; output($Refs->get('ToC')); output "</p>\n"; });
Opening the file and outputing the lines used Perl's open and while functions to open the disclaimer file and output each line.
open(DISCLAIMER, "../image/disclaim.htm") || warn "Could not open disclaimer file\n"; while (<DISCLAIMER>) { output $_; }
The contents are saved to a string, then any "<" are converted to < before the element is output.
sgml('<VERBATIM>', sub { push_output 'string'; }); sgml('</VERBATIM>', sub { my $lVerbatim = pop_output; $lVerbatim =~ s/</\</g; output "<pre>", $lVerbatim, "</pre>\n"; });
The system identifier corresponding to the graphic entity is located using a function provided by the SGMLS.pm package, then the file with the .gif or .jpg extension is found using Perl's -f operator that tests for a file's existence:
sgml('<GRAPHIC>', sub { my $lElement = shift; my $lSysID = $lElement->attribute('FIGNAME')->value->sysid; my $lRootname = $lSysID; $lRootname =~ s/\.[^.]*$//; output "<br>\n"; output "<img align=\"middle\""; if (-f "../image/$lRootname.gif") { output " src=\"../image/$lRootname.gif\""; } elsif (-f "../image/$lRootname.jpg") { output " src=\"../image/$lRootname.jpg\""; } output ">\n<br>"; });
Tony Graham, Consultant, Mulberry Technologies, Inc.
6010 Executive Boulevard, Suite 608, Rockville MD 20852 U.S.A.
Phone: (301) 231 6930, E-mail: info@mulberrytech.com, WWW: http://www.mulberrytech.com
Tony Graham has been working with SGML for over five years. He has worked as an Editor and a Document Analyst with Uniscope, Inc. in Tokyo, Japan for four years, and as an SGML Consultant with ATLIS Consulting Group, and he is currently a Consultant with Mulberry Technologies, Inc., an SGML Consultancy specializing in training and design. Tony has designed, built, and tested DTDs and SGML applications for clients in the academic publishing, aerospace, automotive, database publishing, electronic component, photocopier, and software industries, and the languages used in these systems have been English, Japanese, Chinese, and Korean. In addition, his contributions have been incorporated into the DocBook, J2008, and Pinnacles SGML application standards.
Tony is also a qualified Electrical Engineer, and he has written programs in everything from FORTRAN on mainframes to programs in custom languages on embedded microcontrollers. Within this range is included SGML processing programs in Perl, Tcl, C, and Scheme.
This page has been produced using a program developed for the "Free SGML Transformation Tools" paper presented at the SGML'96 conference. The program was produced in the hope that it will be useful, but is without any warranty; even the implied warranty of merchantability or fitness for a particular purpose.
The program is copyright 1996 Tony Graham.
The text comprising the contents of this HTML page may be subject to the copyright of its copyright holders. The authors, owners, and copyright holders of the program take no responsibility for the output of the program or for the uses to which the program may be put.