Integrating Guidon with the World Wide Web

Project Manager: Thomas B. Hickey, Chief Scientist


Abstract

Guidon is OCLC's graphical interface to Electronic Journals Online. The World Wide Web (WWW) is a system that OCLC and many other institutions use to offer services over the Internet. We have extended a prototype version of Guidon so that it can display HyperText Markup Language (HTML) documents and function as a WWW browser. In addition to letting Guidon view Web pages, HTML offers several powerful features that could be integrated into OCLC's electronic journal services, such as a flexible forms capability.


Guidon has been the primary interface to OCLC's Electronic Journals Online (EJO) service since its introduction in 1992. It is a Microsoft Windows graphical interface which offers features suited to the electronic publication of scholarly material, such as full-text retrieval and the display of mathematics, tables, line drawings, and photographs. Guidon operates remotely using the Z39.50 Information Retrieval Protocol, either over dial-up lines or over the Internet. Although the material displayed on the screen is stored in the database in SGML (Standard Generalized Markup Language), the paragraphs are preformatted using the TeX typesetting system. This approach has certain advantages in offering publishers the ability to control to a high degree how articles look on users' screens. However, because Guidon does not have a built-in SGML display system, it cannot display SGML databases other than OCLC's.

The World Wide Web (WWW) uses a form of SGML called HyperText Markup Language (HTML) to code text for display and to link online documents together. To say the Web is becoming popular dramatically understates the WWW phenomenon. HTML and the WWW are becoming not only the most popular way to publish information on the Internet, but nearly the only way new information is being supplied. Both WWW and EJO services are delivered to users using programs called clients. These clients normally run on the user's personal computer and communicate with a remote server that retrieves and sends the information needed by the user. Guidon is the main EJO client, and Mosaic is the best known of the WWW clients. Given the rapidly expanding resources on the Web, it became apparent that OCLC should investigate offering access to WWW services from within Guidon.

What HTML Offers Guidon

HTML is in some ways a distortion of SGML, or at least of SGML's original purpose. SGML offers a way to mark up document structure: to show, for example, what the title is, where paragraphs are, and what the headings are. Structural markup makes it possible to divorce specifications for the appearance of text from description of what the text "is." A separate style sheet describes how the information is actually displayed. A simple example of this is to mark text as emphasized rather than italic, and to have the style sheet specify that emphasized text be displayed with an italic font. Text is, however, complicated enough that devising generic markup for most items is not so clear cut. Almost all applications of SGML allow some mixture of layout and logical markup.

HTML takes a pragmatic view of markup. It requires the minimum needed for simple text formatting and graphics inclusion. This simplicity is augmented with quite powerful maps, forms, and hypertext capabilities. Maps are graphics that allow the server to tell where the user is pointing when a mouse button is clicked. Forms include buttons that users can push and text entry fields that users can type text into. Coupled with hypertext links, these capabilities allow HTML to offer a rich interface to a wide variety of material in a format that novices can learn easily. HTML has struck an excellent balance between simplicity and functionality.

These features could be important to Guidon in two ways. First, HTML support would offer access to WWW services. This is becoming increasingly important, especially since OCLC plans to offer a number of services through the WWW, such as FirstSearch. If Guidon could interact with these services, it would be much easier for users to integrate bibliographic searching via FirstSearch or some other system with Guidon's access to high-quality journal articles.

Second, HTML support would add the possibility of using HTML pages integrated with EJO journals. HTML offers a familiar and easy way to construct pages that would help users browse journals. Further, the HTML language could be used by publishers to offer interactive services to EJO users that Guidon does not presently supply.

Our Approach

We explored alternative C++ compilers and class libraries for Guidon to make it easier to support the Apple Macintosh environment. To do this, we constructed a "skeleton" of the Guidon application to see how well the interface ports. Our experience with this porting effort was positive, and this skeleton became the platform for experiments in HTML support within Guidon.

Success in this type of exploration is often a matter of restricting the scope of effort. To exploit HTML, we decided not to try to implement all features of Mosaic. Instead we developed basic browser capabilities step-by-step. At first we were happy simply to display a basic HTML file. The next step was to connect to an HTTP (HyperText Transfer Protocol) server to retrieve other HTML files. We decided not to support other common communications protocols, such as Gopher, FTP, and Telnet. We wrote a simple HTML parser and created C++ classes to support the display of the text on screen.

We made changes in the skeleton Guidon client to support an additional publication style of HTML, with its own set of toolbar icons. Using the Microsoft Foundation C++ classes as the basis, it was easy to add full printing support, including previewing. We restricted graphics support to GIF files, borrowing code from a public domain GIF translator. Gradually, as we uncovered performance problems, we optimized the parser and window interaction to reduce the number of parses and GIF decompressions to a minimum.

Having an HTML client that we have written is an advantage when exploring new HTML features, such as split screens with toolbars that can be maintained across several documents. Because the code is completely under our control and still fairly simple, it is easy to modify for experimentation.

Problems

Many of the features of HTML supported in Guidon are only prototypes of what could be done. Our URL (Universal Resource Locator) parsing is rudimentary, and our support of graphics and connections to HTTP servers is simplistic. Fortunately, robust source code is available for all of these functions, some of which would probably need to be incorporated into Guidon if an HTML capability is added to the production system.

More important are the remaining interface problems relating to such things as:

Fortunately, none of these problems appears insurmountable, and we expect to find practical solutions.

The Future of Guidon

Mosaic and other WWW client programs offer impressive functionality in combination with the HTML documents they display. OCLC is currently offering, or planning to offer, services via the WWW. Without the WWW technology we would have had to design and build custom client software to provide these services. There are some obvious advantages to using a generic client such as Mosaic for access to our services:

A client under our control offers us advantages as well:

Currently, the most important of these is control over information display, including printing. We have devoted much effort to this area and have developed functionality more advanced than that of current HTML clients. Session level support is also important. Mosaic is connectionless. It does not maintain a connection with a server any longer than is needed to retrieve a particular file. Services such as EJO maintain a connection with the server that allows a richer interaction. To provide EJO over the WWW, we have to synthesize a session.

For the immediate future we expect to have a proprietary client (Guidon) as well as to offer access via Mosaic and the WWW. We expect the WWW to become more capable, especially in information display. It is possible that at some point the use of Guidon will become so low as not to warrant the expense of maintaining it.

Conclusions

It has proven possible and useful to add HTML support to EJO's Guidon client. This would enable Guidon to use WWW services and allow EJO to use HTML. We expect that this capability will help Guidon to remain an important client for information display. Current plans are to incorporate the ability to launch other viewers (including HTML clients like Netscape and Mosaic) into the next version of Guidon released for Microsoft Windows.