[Mirrored from: http://www.oclc.org/fred/docs/translations/intro.html]

Introduction to Translating Tagged Text via the SGML Document Grammar Builder Engine

Keith Shafer
[email protected]

Roger Thompson
[email protected]
OCLC Online Computer Library Center, Inc.
6565 Frantz Road, Dublin, Ohio 43017-3395

Abstract: While the Standard Generalized Markup Language (SGML) promises freedom from proprietary data formats, it is still difficult to translate arbitrary SGML data to other formats. To address the SGML translation needs at OCLC, we have added translation capabilities to the SGML Document Grammar Builder programming engine. Several systems incorporate this programming engine, including our Fred interpreter. In this paper, we describe the general translation capabilities of this programming engine and relate it to Fred.

1.0 Introduction

The Standard Generalized Markup Language (SGML) is a meta-language for writing Document Type Definitions (DTD) [ISO8879]. A DTD describes how a document conforming to it should be marked up: the structural tags that may occur in the document, the ordering of the tags, and a host of other features. Simply put, a DTD describes a class of tagged documents in a vendor-independent way. At OCLC, we have several tagged data sources for our reference databases. This data is not always pure SGML since, by definition, SGML requires a DTD and some of the tagged data does not strictly adhere to a DTD.

Regardless, the tagged text that OCLC receives must be manipulated into multiple formats to enable printing, database building, and multiple displays. For example, OCLC's Electronic Journals Online (EJO) provides a typeset quality display of journal articles via the Guidon document viewer [Keyhani] [Hickey92] [Hickey94]. While Guidon displays documents processed via TeX [Knuth], EJO accepts source documents marked up via SGML. Thus, these source documents must be translated from SGML to TeX. Some of these same source documents must also be translated into the HyperText Markup Language (HTML) to be made available via the World Wide Web (WWW). Accordingly, translating tagged text into multiple output formats is of primary interest to OCLC.

A few general translation tools are available, but most force users to map into a predefined DTD (which may be difficult or impossible to do) or do not offer sufficient options to meet the translation needs at OCLC. For instance, while there is now an international standard for mathematical markup, ISO 12083 [ISO12083], there are no systems that produce formatted documents from the complete standard. Since OCLC receives 10283 mathematical markup that must be translated to TeX, we added translation capabilities to the SGML Document Grammar Builder project.

The SGML Document Grammar Builder project is an ongoing research effort at OCLC Online Computer Library Center, Inc. studying the manipulation of tagged text [Shafer94a]. This project has resulted in the construction of a C++ engine library, the Grammar Builder Engine (GB-Engine), that provides a library of objects that can be used to automatically create reduced structural representations of tagged text (DTDs), translate tagged text, automate database creation, and automate interface design -- all from sample tagged text. While the GB-Engine is embedded in a number of systems, Fred is currently the most popular.

Fred is an extended Tcl/Tk interpreter. Tcl is a complete string command language with variables, strings, lists, etc., and Tk is an X-based toolkit [Ouster]. As a result, Fred is a complete interpreter/shell that has access to the GB-Engine objects and can be used to easily build X interfaces. It is important to note that Fred's translation functionality is actually provided by the GB-Engine and is not restricted to Fred. For instance, we have ported the translation portion of the GB-Engine to the PC and are considering Perl [Wall] and/or Scheme [Abelson] as an alternative to the Tcl portion of Fred. Fred is currently being used for a number of translation tasks at OCLC including 12083 mathematical markup conversion to TeX for EJO, conversion of EJO documents to HTML [Weibel], and the Internet Cataloging research project [Vizine]. Free access to a Fred server is available via the WWW [Shafer94b]. In the remainder of this paper, we present the basic GB-Engine translation capabilities.

2.0 SGML Essentials

To understand GB-Engine translation, one must have at least a superficial knowledge of SGML. Specifically,

how document structure is marked up,
how tag attributes are specified, and
how general entities are used.

Those familiar with these concepts may want to skip to the GB-Engine Translation Process section below.

2.1 Tagged Document Structure

SGML is used to guide the markup of document structure with tags that are clearly distinguishable from document content. Generally speaking, the beginning of a structure in an SGML document is marked by a start tag and the end of the structure is marked by an end tag. A start tag is the character "<", followed by the tag name, followed by a ">" (e.g., <author>). An end tag is a "<", followed by a "/", followed by the tag name, followed by a ">" (e.g., </author>). For example, one might see this text and markup in a document which means that the structure author has the content John Smith:

        <author>John Smith</author>

GB-Engine objects recognize traditional SGML tags in text to build a structural representation of the document for translation. This is done by matching start and end tags. During this process, the GB-Engine is able to determine when a start/end tag match resides inside of another start/end tag match. In this way, the GB-Engine can build a tree-structured directory of the document. For instance, author and title reside inside of article in the following:

        <article>
        <title>A Short Document</title>
        <author>John Smith</author>
        ...
        </article>

The GB-Engine builds a corresponding internal tree representation of the article that looks something like:

               article
              /   |   \
             /    |    \
        title   author  ...

We would like to note that during the course of discovering document structure, the GB-Engine can also handle certain missing tags. Without a DTD, however, it is impossible to know how to handle all missing tags. (Recall that we often see tagged text without a corresponding DTD.) Consider the following markup:

        <foo>...
        <name>John Smith
        <extension>5555</extension>

Should </name> be inserted before <extension>? We would say yes, but this is probably because we see the tag name, think of the semantic idea of a name, read John Smith, and think, "Hey, John Smith is a name". But, where should </foo> go? Before <name>, before <extension>, or after </extension>? Since we have no semantic understanding of foo, it is impossible for us as readers to determine where <foo> should go. Likewise, it is impossible for the GB-Engine objects to determine where all missing tags should go without outside intervention.

2.2 SGML Attributes

GB-Engine objects check and use SGML attribute values in many aspects of translation. Essentially, SGML tag attributes can be thought of as values that are associated with a given structural start tag. For example, the tag author in the following has the attribute primary with the value of yes.

        <author primary=yes>John Smith</author>

This markup shows that John Smith is the primary author. It does not indicate what an application should do with this additional information. For instance, an application might make this name appear first, or extract it for some abstracting service.

2.3 SGML Entities

The GB-Engine also provides SGML entity support. SGML entities are like textual variables and pointers. Basically, an SGML entity begins with an ampersand and ends with a semi-colon. For instance, a document could contain the entity reference "&food;" which might be replaced by the word "pizza" in the translated text. The GB-Engine let users specify entity substitutions, leaving entities with unspecified substitutions as they appeared in the original text.

3.0 The GB-Engine Translation Process

The initial goal of GB-Engine translation was to build a system that provided for easy manipulation of tagged documents by translating, replacing, moving, or removing tags and their corresponding sub-structures. To accomplish this, GB-Engine translation requires three things:

a tagged text to translate,
a translation script describing the desired transformation, and
an optional entity translation table.

3.1 Tagged Text

The GB-Engine first processes the tagged document to extract the tags and discover the underlying structure of the tagged text as described in the section on tagged document structure above. As noted earlier, the GB-Engine searches for start/end tags and matches these tags to build a tag structure (or document structure) that reflects the structure of the original tagged text.

3.2 Translation Script

The GB-Engine translation process is an interpreted process where the translation script is the user-supplied program of desired transformations. Every translation script is made up of translation statements. Each translation statement is composed of two parts, a condition and a block of actions:

        if (condition)            { actions }

Translation conditions can be combined using the standard Boolean operators and can be parenthesized for grouping and readability. Translation actions can be nested and include sub-blocks of conditions and actions. Conditions are commonly enclosed in parentheses ()'s and action blocks are commonly enclosed in braces {}'s.

Given a good tagged document structure and a translation script, the GB-Engine applies the complete translation script to each tag in the document structure in succession by performing a depth-first traversal of the document structure. (This tag traversal corresponds to the natural reading order of the document.) That is, each tag is checked against each statement condition in the translation script. If a statement condition evaluates to TRUE for a tag, the corresponding actions are applied to that tag. Thus, multiple translation statements may be applied to a single tag and a single translation statement may be applied to multiple tags.

The translation process has no effect on tags that have no conditions that evaluate to TRUE for them in the translation script. Accordingly, a null translation script will reproduce the original document -- the only difference being that some non-tagged white space will be removed. (Many people add white space like carriage returns, tabs, and spaces to tagged documents to make them easier to read. In most cases, this white space is not part of the document structure because it is not tagged. Since the translation process allows for text movement, we do not attempt to retain non-tagged white space in the translated text. For that matter, we have no way of knowing where the non-tagged white space should go and arbitrary insertion of such non-tagged white space may produce invalid translation results.)

3.2.1 Translation Script Syntax and Nesting

The syntax of a translation script is much like C, except that our parsing algorithm is very forgiving. For instance, conditions need not be preceded by an "if" unless they are part of an "if...elseif...else" construct. Furthermore, conditions need not be parenthesized unless multiple conditions are combined using Boolean operators, actions need not be surrounded by braces unless actions are nested, actions need not be followed by a semicolon, and all condition/action names are case insensitive. All of the following are equivalent:

        if (condition1)    { action1; }
        if (condition1)      action1;
        if  condition1       action1;
            condition1       action1

Because of the ability to not use parentheses and braces, GB-Engine translation scripts have one notable difference from C: whereas C assumes that the action following a condition is one line unless braces are used, GB-Engine translation scripts assume that the actions following a condition belong to the condition until the next condition is found since every translation statement requires a condition. This only causes concern when nested actions are desired. To see this, consider the following translation script:

        if (condition1)      action1; 
                             if (condition2)   action2;
                             action3;

Based on indentation, it appears as if the writer of the script expects the script to be processed as:

        if (condition1)    { action1; 
                             if (condition2) { action2; }
                             action3; }

But, because of the missing {}'s, the script is actually processed as:

        if (condition1)    { action1; }
        if (condition2)    { action2;
                             action3; }

Note that the expected nesting based on indentation is very different from the actual interpretation. To show this, the following table presents the expected actions to be performed based on indentation against the actual actions performed. Note that the results differ whenever only one of the conditions evaluates to TRUE.

         Value of      Value of     Expected     Actual
        condition1    condition2    Actions      Actions
        ------------------------------------------------
           TRUE          TRUE       1, 2, 3      1, 2, 3
           TRUE         FALSE       1,    3      1
          FALSE          TRUE         none          2, 3
          FALSE         FALSE         none        none

So, if nested conditions within actions is required, braces must be used. Finally, if the first non-white space string on a line in a translation script or entity translation table is "#" or "//", the line is considered a comment and ignored.

3.2.2 Translation Script Parameters

Most translation conditions/actions require parameters which are treated similar to character strings in C and other programming environments. Accordingly, backslashes can be used to escape characters to get tabs, spaces, newlines, parentheses, and quotes into parameter values. For instance, \t in a parameter value will be interpreted as a tab and \n in a parameter value will be interpreted as a newline. If \t is really desired in a parameter value and not a tab, it must be input as \\t. This will cause the backslash itself to be escaped. If the desired parameter value includes spaces, the parameter must be quoted using single or double quotes, or the spaces must be escaped with backslashes. The following are equivalent:

        if  Start_Tag(author)  Literal(AUTHOR\ :);
        if  Start_Tag(author)  Literal("AUTHOR :");

The "Start_Tag" condition checks to see if the current tag is an author start tag (e.g., <author>) and the "Literal" action replaces the original start tag text with the literal string "AUTHOR: " in the translated output text.

Quotes themselves can be escaped by a backslash to make them part of the parameter value, or single quotes can be used when a double quote is desired in the text or visa a versa:

        if  (condition1)    Literal('He said, "...');
        if  (condition2)    Literal('He said, \'...');
        if  (condition3)    Literal("He said, \"...");
        if  (condition4)    Literal("He said, '...");

Several conditions/actions assume that an empty parameter can take on any value. For instance, "Start_Tag" takes one parameter, the name of the start tag desired. With an empty or zero-lengthed parameter, "Start_Tag" will match any start tag. Thus, the following script will replace all start tags with the string dog:

        if  Start_Tag()     Literal(dog);

3.3 Entity Translation Table

The entity translation table provides for textual substitutions based on SGML entities. GB-Engine translation only supports general entities that begin with an ampersand "&" and end with a semi-colon ";". The entity translation table has a very simple, line oriented format where single or double quotes are used to signify the entity and its substitution:

        "entity"       "substitution"

So, our "&food;" to "pizza" entity example from above might appear in the entity translation table as:

        "food"         "pizza"

Note that the "&" and ";" normally surrounding an entity reference do not appear in the entity translation table. The entity translation table can use backslashes with the same interpretation as described in the translation script parameter discussion above. Entity substitution is the last thing to occur during the translation process so the translation script can add entities to the text.

3.4 The Three Translation Parts Combined

To recap this section, here is a simple, complete example with the three translation parts and the resulting translation output:

    SAMPLE TAGGED TEXT:
        <author>John &g; Smith</author>

    TRANSLATION SCRIPT:
        if  Start_Tag()    Literal('*[');
        if  End_Tag  ()    Literal(']*');

    ENTITY TRANSLATION TABLE:
        "g"                "George"

    TRANSLATION OUTPUT:
        *[John George Smith]*

4.0 Translation Script Examples

We now present a number of examples to demonstrate some GB-Engine translation conditions and actions. These examples are not intended to be a complete demonstration of the GB-Engine translation capabilities. A complete listing of GB-Engine translation conditions and actions and a short explanation of each is available via the WWW [Shafer94c] with additional documentation and samples available under Fred's translation page [Shafer94d]. Since the use of the entity translation table is rather straight forward, we will focus on translation scripts in the following. Finally, any line numbers in the following examples are for reference only and are not part of the script syntax.

4.1 Translation Conditions

There are several types of conditions including checks on tag names, tag attributes, variables, and the document structure. Many of these conditions can scan the document structure from the current tag looking for tags that match the desired conditions. In this section, we will demonstrate some of these conditions by presenting several different ways to translate tagged font information into TeX equivalents.

4.1.1 Tag Names

Many translations can be accomplished by simple conditions based on tag names alone. For example, when translating ISO 12083 mathematical markup to TeX there is a simple mapping from "<bold>...</bold>" to "{\bf ...}" based solely on tag names:

    (1) if      Start_Tag(bold)    Literal("{\bf ");
    (2) if      End_Tag  (bold)    Literal("}");

Line 1 matches any bold start tag and puts the literal text string "{\bf" in the translation output text in place of the original start tag. Line 2 matches any bold end tag and puts literal text string "}" in the translation output text in place of the original end tag. Note, that in this script, the condition on line 2 is tested even if the condition on line 1 is true. Since translation scripts also support an "if...elseif...else" construct, the above might have been better written as:

    (1) if      Start_Tag(bold)    Literal("{\bf ");
    (2) elseif  End_Tag  (bold)    Literal("}");

4.1.2 Tag Attributes

Aside from tag names, tag attributes are the other most useful aspect of SGML tags. For instance, assume that font markup relies on attribute values instead of tag names. Then markup like "<font style=bold>" could be translated by a script like:

    (1) if  Match(style, bold) {
    (2)     if      Start_Tag(font)    Literal("{\bf ");
    (3)     elseif  End_Tag  (font)    Literal("}"); }

The condition "Match" is used to see if the current tag has the attribute "style" with the value "bold". In GB-Engine translation, all end tags have access to the attribute/value pairs of their corresponding start tag. Accordingly, line 3 can put out the closing brace only when necessary. Without this capability, it would be difficult to really use attribute values in translation.

Now assume that font markup used actual font names as attribute values like "<font name = Adobe-Helvetica>". A script like the following could extract the value of the attribute and write it in the translation output:

        if  (Start_Tag(font) && Att_Present(name))   
             GetVal(name, "\fontname{%s}");

The translation condition "Att_Present" checks to see if the attribute name is present. The translation action "GetVal" gets the value of the attribute name off of the current tag. This value is placed in the translation output via the C printf-like format string specified in the second parameter of "GetVal". This parameter must contain a "%s". In this case, the value of the attribute name, Adobe-Helvetica, is placed in the format string to produce this output:

        \fontname{Adobe-Helvetica}

Conditions and actions that get and test the value of attributes are used extensively in GB-Engine translation.

4.1.3 Structural Conditions

More sophisticated translation can done by looking at where a tag occurs in the document structure. For instance, in translating ISO 12083 mathematical markup to TeX, the radical structure has two possible outputs depending on the number of immediate sub-structures. If it has one sub-structure, meaning that it is a square root, the appropriate output is "\sqrt". If it has two sub-structures, meaning that it is a radical with an explicit radix, the output is "\root". A portion of a translation script to handle radicals might look like:

    (1) if  Start_Tag(radical) && Count_Children(1)   Literal("\sqrt");
    (2) if  Start_Tag(radical) && Count_Children(2)   Literal("\root");

While the above script checks the exact number of radical children using the translation condition "Count_Children()", there are several other conditions that perform non-exact numerical comparisons on variables. We would like to note that the actual script to handle 12083 mathematical radicals is much longer due to several optional tagging mechanisms. For instance, a radical may or may not have a radix, so we have to check to see if a radical has a radix child in several places.

There are several other structural conditions that can be used to check the name and attributes of the tags surrounding the current tag during translation. For instance, the condition

        Check(name, att, cmp, val, pos, scan)

can be used to determine if the current tag has a name neighbor that has the attribute att with the value val to pos direction of itself. The value of att is compared to val as designated by cmp. cmp may be one of (<, <=, ==, >=, >, lt, lte, eq, gt, gte). When cmp is one of (<, <=, ==, >=, >), the values are treated as numbers. When cmp is one of (lt, lte, eq, gt, gte), the values are treated as strings. pos can be one of (here, left, right, up, down). If pos is here, then the check is merely applied to the current tag itself. If scan is 0, then the check is performed only on the immediate neighbor in the pos direction. Otherwise, the check continues until a match is found or all pos neighbors have been checked. If name is empty, any tag name will do (i.e., the name is not used to restrict the search). Likewise, if att is empty, the attribute is not used to restrict the search. For example, the following will check to see if the current tag has an ancestor named foo with an id attribute integer value greater than or equal to 10:

        Check(foo, id, >=, 10, up, 1)

As a matter of convenience, "Check" has a wide variety of aliases with more descriptive names based on default values for some of the parameters.

4.2 Translation Actions

The GB-Engine supports many different translation actions including text addition, text removal, text movement, and text sorting. In this section we briefly introduce several classes of translation actions. In doing so, we will often rely on the presentation of sample tagged text, a translation script, and the resulting translation output to convey the results of translation actions.

4.2.1 Basic Actions

Two basic actions have already been presented: "Literal" and "GetVal". Most of the other basic actions provide straight forward access to simple tag information. Most of these work like "GetVal" in that the user must specify a printf-like string to control how the requested value will be placed in the translation output. For example, the following shows several ways to access a tag name.

    SAMPLE TAGGED TEXT:
        <author primary=yes>John Smith</author>

    TRANSLATION SCRIPT:
        if Start_Tag(author) { 
            Get_Tag          ("Get_Tag...........: %s\n");
            Get_Tag_Text     ("Get_Tag_Text......: %s\n");
            Get_Tag_Name_Only("Get_Tag_Text_Only.: %s\n"); 
        }

    TRANSLATION OUTPUT:
        Get_Tag...........: <author primary=yes>
        Get_Tag_Text......: author primary=yes
        Get_Tag_Text_Only.: author
        John Smith</author>

The translation action "Get_Tag" returns the complete tag as it appeared in the original, including attributes and <>'s. The translation action "Get_Tag_Text" returns the complete tag as it appeared in the original, including attributes, but without <>'s. The translation action "Get_Tag_Text_Only" returns just the tag name.

It should be noted that there is a generalized version of "GetVal" that can be used to get attribute values from tags, "Compare_Get_Value". The first six parameters to "Compare_Get_Value" are just the six parameters to the condition "Check" explained in the section on structural conditions above. The last parameter, string, is like the printf format string parameter to "GetVal".

        Compare_Get_Value(name, att, cmp, val, pos, scan, string)

4.2.2 Variable Actions

There are several translation variable actions that can get and set the value of string and integer variables. If a string variable is referenced before it is set, it is considered to be a zero-length string. If an integer is accessed before its value is set, its initial value is automatically set to zero. For instance, the following script increments the variable x by one on every start tag line and outputs the value in a format statement that works like a C printf statement.

    SAMPLE TAGGED TEXT:
        <line>Hi,</line>
        <line>and</line>
        <line>bye.</line>

    TRANSLATION SCRIPT:
        Start_Tag(line) { Increment (x, 1);
                          Format_Int(x, "%d: "); } 
        End_Tag  ()       Literal(\n);

    TRANSLATION OUTPUT:
        1: Hi,
        2: and
        3: bye.

4.2.3 Remove Actions

In complex documents, it is not unusual to have to remove sections of the document for different output formats or post processing. This removal of text is not that same as replacing a tag with an empty string. Removal actually removes the specified tag and any structure it contains. Consider the following:

    SAMPLE TAGGED TEXT:
        <title>A book</title>
        <author><first>John</first><last>Smith</last></author>
        <year>1995</year>

    TRANSLATION SCRIPT:
        if      Start_Tag(author)    Remove ();
        elseif  End_Tag  ()          Get_Tag(%s\n);

    TRANSLATION OUTPUT:
        <title>A book</title>
        <year>1995</year>

While many translation tasks can be performed in a stateless manner, we have found that some require multiple steps. As part of the EJO HTML offering, ISO 12083 mathematical markup had to be translated to TeX, removed from the text, processed by TeX to produce a renderable image, and a pointer to the resulting image had to be inserted in the HTML text. Accordingly, the GB-Engine provides several ways to create and associate keys with removed sections of text so they can be easily accessed after translation.

4.2.4 Move Actions

The ability to arbitrarily restructure the output is one of the most powerful GB-Engine translation capabilities. Consider the following.

    SAMPLE TAGGED TEXT:
        <b>2</b>
        <a>1</a>
        <c>3</c>
        <e>5</e>
        <d>4</d>

    TRANSLATION SCRIPT:
        if      Start_Tag(b)                Move_Right;
        if      Start_Tag(d)                Move_Left;
        if      Start_Tag()                 Literal('*');
        elseif  End_Tag  ()                 Literal('');

    TRANSLATION OUTPUT:
        *1*2*3*4*5

The "Move_Right" and "Move_Left" actions move tag b and tag d into place. This hard-coded movement is seldom used, but it does make for a simple example. However, note the preceding asterisk in the translation output. It would be tempting to try to remove this asterisk using "Left_Peer", a translation condition that checks to see if the current tag has a left peer.

    SAMPLE TAGGED TEXT:
        (same as previous example)

    TRANSLATION SCRIPT:
        if      Start_Tag(b)                Move_Right;
        if      Start_Tag(d)                Move_Left;
        if      Start_Tag() && Left_Peer    Literal('*');
        else                                Literal('');

    TRANSLATION OUTPUT:
        *12*3*4*5

This output still has the preceding asterisk, but now there is also no asterisk between 1 and 2. We presented this example to show that most conditions check the original document structure, not the developing translation structure. Since tag a had a left peer in the original structure the preceding asterisk was output again and since the b tag did not have a left peer in the original structure no asterisk was placed between 1 and 2.

On the other hand, some actions do use the translation structure. For example, imagine that one wants to accumulate all tables in a document in sorted order before a bibliography that resides directly off of the document's root. The following transformation script does just that:

        Start_Tag(table)    Move_Root;  Move_Before(bibliography);

The action "Move_Root" is used to move the table up to the root of the document. Then the action "Move_Before" is used to move the table before the bibliography. This type of arbitrary movement is both powerful and simple to express in GB-Engine translation. Several other movements are available, including some based on the original document structure and some based on the developing translation output structure.

4.2.5 Sort Actions

Sometimes a document has structures that need to be sorted. To accommodate this, the GB-Engine allows groups of tags to be sorted by tag name or attribute value. For instance, the following sorts all of the structure under test by tag name.

    SAMPLE TAGGED TEXT:
        <test>
            <b num=4>b</>
            <a num=5>a</>
            <c num=3>c</>
            <e num=1>e</>
            <d num=2>d</>
        </test>

    TRANSLATION SCRIPT:
        if  End_Tag(test)    Sort_Kids_By_Name(1);
        if  Any_Tag()        Literal('');

    TRANSLATION OUTPUT:
        abcde

The parameter to "Sort_Kids_By_Name" tells GB-Engine whether or not to defer the sort until after all non-sort movement has been done since non-sort moves can cause other tags to be introduced into the structure that is to be sorted. If the value of defer is 0, the sort is done immediately. If the value of defer is 1, the sort is deferred until all non-sort moves have been performed. Most sorts are deferred.

Now assume that we wanted to sort the same input by the attribute num.

    SAMPLE TAGGED TEXT:
        (same as previous example)

    TRANSLATION SCRIPT:
        if  End_Tag(test)    Sort_Kids_By_Attribute(num,1);
        if  Any_Tag()        Literal('');

    TRANSLATION OUTPUT:
        edcba

The second parameter to "Sort_Kids_By_Attribute" is the defer parameter like that explained for "Sort_Kids_By_Name". Note that in the last two scripts, the sort was initiated on an end tag. We did this to show that the GB-Engine translation can handle the minimal end tag "</>" and does not require end tag names to be entered directly.

5.0 Future Work

GB-Engine translation provides a means whereby tagged documents can be transformed into arbitrary output formats. This ability further realizes SGML's promise of freedom from proprietary data formats. GB-Engine translation services are freely available via a WWW Fred server [Shafer94b].

Writing GB-Engine translation scripts is, for better or worse, a lot like writing C programs. The process is simplified in that the GB-Engine directly supports the traversal of the document to be translated and directly supports several complex conditions and actions needed for SGML translations. In this paper, we specifically presented GB-Engine translation conditions based on tag names, tag attributes, and document structure as well as basic actions, variable actions, remove actions, move actions, and sort actions. We are continuing to expand GB-Engine translation capabilities based on suggestions from our translation users. For example, we expect to add direct checking/manipulation of data content and conditions that can check the developing translation structure.

It is interesting to note that we wrote this paper in SGML and have been using the GB-Engine via Fred to simultaneously translate the single SGML source to ASCII, HTML, and TeX(PostScript).

References

[Abelson]
Harold Abelson, Gerald Jay Sussman, and Julie Abelson. Structure and Interpretation of Computer Programs. The MIT Press, 1985.

[Hickey92]
Thomas B. Hickey and Terry Noreault. The Development of a Graphical User Interface for The Online Journal of Current Clinical Trials. The Public-Access Computer Systems Review, 3(2):4-12, 1992.

[Hickey94]
Thomas B. Hickey. Reference Client Software Design. In Annual Review of OCLC Research July 1992-June 1993, pages 37-39, 1994.

[ISO12083]
Electronic Manuscript Preparation and Markup. ANSI/NISO/ISO 12083, 1994.

[ISO8879]
Information Processing -- Text and Office Systems -- Standard Generalized Markup Language (SGML). International Organization for Standardization. Ref. No. ISO 8879:1986, 1986.

[Keyhani]
Andrea Keyhani. The Online Journal of Current Clinical Trials: An Innovation in Electronic Journal Publishing. Database, pages 14-23, February 1993.

[Knuth]
Donald E. Knuth. The TeXbook. Addison-Wesley Publishing Company, 1984.

[Ouster]
John K. Ousterhout. Tcl and the Tk Toolkit. Addison-Wesley Publishing Company, 1994.

[Shafer94a]
Keith Shafer. SGML Grammar Structure. In Annual Review of OCLC Research July 1992-June 1993, pages 39-40, 1994.

[Shafer94b]
Keith Shafer. Fred: The SGML Grammar Builder. Fred's WWW home page. Accessible at URL:http://www.oclc.org/fred/, 1994.

[Shafer94c]
Keith Shafer. Quick Translation Reference for Fred. Accessible at URL:http://www.oclc.org/fred/docs/help/quick.html, 1994.

[Shafer94d]
Keith Shafer. Fred Translation Information. Fred's WWW translation home page. Accessible at URL:http://www.oclc.org/fred/docs/translations/, 1994.

[Vizine]
Diane Vizine-Goetz, Jean Godby, and Mark Bendig. Spectrum: A Web-Based Tool for Describing Electronic Resources. To be presented at the Third International World-Wide Web Conference. Darmstadt, Germany, 1995.

[Wall]
Larry Wall and Randal L. Schwartz. Programming Perl. O'Reilly & Associates, Inc., 1992.

[Weibel]
Stuart Weibel, Eric Miller, Ralph LeVan, and Jean Godby. An Architecture for Scholarly Publishing on the World Wide Web. In Proceedings from the Second International WWW Conference: Mosaic and the Web, pages 739-748, 1994.