Using SGML in Electronic Catalog Development

Ellen Adams
IBM Corporation, NAS Division
Mail Stop 7J08
Thornwood Conference Center
500 Columbus Ave.
NY 10594


Since every aspect of the business process is under continuous scrutiny, and remaining competitive in today's global marketplace demands that we redefine the ways in which we do business, IBM has pioneered the Electronic Purchasing Service, founded on the cornerstone of SGML.

The Electronic Purchasing Service (EPS) was designed as a way to provide IBM's vendor customers with a way of reducing costs and improving control, while providing a better level of service to their own customer bases. The Electronic Purchasing Service is an advanced, network-based sales and procurement solution that allows end users to locate, compare and purchase items directly through electronic catalogs.

With its preeminence as a electronic document exchange standard and its focus on reusability, SGML was deemed ideal for this electronic commerce application. Therefore, IBM Thornwood has used the Standard Generalized Markup Language to develop an electronic catalog application.

The presentation will cover the four basic tasks or stages in the implementation of the Electronic Purchasing Service application:

This session will briefly describe the most important functions in each of these stages, and the kinds of tools IBM used in performing them.


Electronic commerce is certainly not a new idea. And with its various transaction processing and computing systems, IBM has long been at the forefront of electronic commerce. IBM is also credited with the birth of SGML, so there is little surprise at the marriage of SGML and electronic commerce. Charles Goldfarb and his IBM Research team in the 1960s created a method, the Generalized Markup Language (GML), to let text editing, formatting, and information subsystems share documents. Over the course of nearly two decades and through the efforts of many people and groups, GML eventually gave rise to the standard SGML that was adopted by the International Organization for Standardization (ISO 8879). Since then, various groups have adopted SGML as the international standard for data and document interchange in open system environments, including the automotive, defense, commercial aerospace, pharmaceutical, electronics, and telecommunications industries.

IBM is now actively marketing a service called the Electronic Purchasing Service -- a multi-tiered service option, the core of which is electronic catalogs, using SGML. This presentation will focus on how SGML was used in developing the electronic purchasing service application, the tools that were used, and the methods used.

Meeting the challenge of Electronic Catalog Creation

The Electronic Purchasing Service (EPS) appeals to organizations with large purchasing departments. The area that accounts for the largest percentage of spending in many companies is the purchase of office supplies, equipment, computers and services. American businesses spend a staggering $250,000,000,000 a year on maintenance, repair and operating supplies (commonly called MRO), according to a recent study. Because the base is so large, every percent saved here can make a huge impact on the bottom line. According to Fortune Magazine, "purchasing exerts far greater leverage on earnings than anything else. By shrinking the bill 5%, a typical manufacturer adds almost 3% to net profits."

End users often buy non-standard items from unapproved suppliers because they feel the organization's purchasing procedures are cumbersome, time-consuming and unresponsive to their needs. Instead, they rely on their own favorite resources-- assorted catalogs, retail locations or other unapproved vendors. This "maverick" buying can drastically diminish purchasing's leverage, which translates into significantly higher corporate costs. IBM offers the service to its vendor-customers which can, in turn, offer customers custom catalogs tailored to client's expressed needs and buying habits.

Since the data is stored in electronic form, and SGML is used, the service has applications for the Internet, giving customers an easily accessible electronic marketplace to consult for goods and merchandise.

Being developed and offered by IBM, EPS is tailored to cut costs and improve control measures for vendors. With sharing and customization of particular information, customers can use it to boost their level of customer service. Vendors can respond more quickly and easily to customer's special requests, and vendors can anticipate and prepare for large orders.

The Electronic Purchasing Service was designed to support a company's entire purchasing life cycle. Before the customer orders, he searches the electronic catalogs, evaluates product specifications, compares products and reviews their prices and availability. When ready to place an order, the customer can use the service to create and approve requisitions, transmit orders and track order status.

After ordering, users can use the service to generate management reports, to link with the company payment process, or the company's general ledger system. With the Electronic Purchasing System in place, vendors discourage their employees from side-stepping a rigorous or complicated purchasing process. The electronic catalog and on-line service were designed to be easy to use.

The Electronic Catalogs

The Electronic Purchasing Service is a subscription service segmented into a variable pricing structure. IBM's staff collaborates with the vendor to develop specially customized electronic catalogs that IBM delivers electronically. Using a software tool called the Product Information Workbench, the vendor and members of IBM's staff create these catalogs to exacting specifications. Special product descriptions, photos and pricing information constitute the catalog content. Vendors select merchandise for catalog inclusion on the basis of their customer's buying habits, their own product availability or any special promotions. Vendors may select all or part of their current product line for inclusion in the catalogs, and pricing and quantity information can be adjusted from catalog to catalog, as well.


We had a variety of objectives when creating the catalogs. The catalogs had to be easy to construct. The Product Information Workbench, the primary tool used to construct the catalogs, imports information in SGML, ASCII text or database formats (such as DB2 or Microsoft Access). It utilizes a tree structure to organize and segment product offerings.

The catalogs had to be easy to use. Customers must be able to browse on-line through the various items. While using the catalog, they will be making crucial buying decisions, filling in electronic forms and creating an electronic purchase order that will be transmitted directly to the vendor.

Approach and Criteria

The Electronic Purchasing Service is a constantly evolving service offering. Its initial concept has been radically changed. Initially a network-based, closed proprietary application, it has changed to become more open solution with numerous options, incorporating Lotus Notes, purchasing card services and a less restricted architecture. However, SGML's role in the application remains assured.

Planning the Application

In planning the EPS electronic catalog offering, we isolated a number of factors which we had to consider when designing our application.

The Link Between Catalog Structure and Order Processing

Data captured and organized into electronic document files, whether in SGML or other formats, is typically intended for some type of processing by computer software. Indeed, in the case of complex data tagged in SGML, this is the direct object: to create input for some type of computer program.

Typical Catalog Processing in the Paper Catalog Model

This reality was masked by the fact that in traditional paper catalogs, nearly all complex data, SGML or otherwise, was intended only for processing by composition software such as PageMaker or QuarkXPress to create page images. Because the processing strategy inherent in this type of software program (linear composition) and end use of the data (readers read paper pages) was always consistent, processing strategy did not emerge as a variable in data design.

However, with the growth of automation, network popularity and the electronic delivery of information, at least four major processing environments have emerged for which the document data must be designed.

These differing processing and end-use environments are important for two primary reasons.The ways in which customers use a catalog --specifically the way they locate and use the information the catalog supplies-- varies sufficiently from environment to environment and shop to shop. This varied approach warrants different material delivery approaches: CD-ROM and network delivery.

We also had to consider the fact that the delivery software tools that IBM will provide to its customers will operate sufficiently differently in different environments. Since not every vendor nor every customer will be equipped with a uniform cadre of IBM PCs, equipped with the latest IBM software, IBM must take a lowest common denominator, or minimum configuration approach, when designing the system.

These differences are critical to the effective planning and design of an effective electronic catalog. Instead of creating data structures and tagging schemes for singular fixed software processing and end use environments, which would be impractical, IBM decided to use a universal data exchange standard, the Standard Generalized Markup Language, to address all environments in which data will or may be processed.

Composing a Catalog For Paper

Marking composition, the baseline for most data design, is a linear process in which software starts at a fixed entry point and sequentially creates output data to drive a typesetter. Marking composition, still the most often encountered processing environment, is also the most forgiving of variations in data design. Accordingly, a catalog designer moving from a linear composition-centered environment to any other environment will likely encounter new and more stringent constraints.

Assuming Linearity

We also had to consider how catalogs are composed. Because print page composition is inherently sequential, print catalog designers can make certain assumptions about the processing of data files that go into the catalog.

Using The Catalog

Finally, we considered how catalogs are used. Of the major information consumption environments, only composition for paper or fixed image publishing is aimed exclusively at the output production process with no consideration required of the final use of the information. While often highly complex in its demands, composition need not consider the unpredictability of human end use of information products. Once the source material is recorded in the agreed format, printed, bound, and delivered, it is up to the catalog reader to apply whatever logic he or she wishes in order to use the information it contains.

Visual Resolution of Information

Because paper pages are intended for reading, composition is usually assumed to be correct if the resulting output is visually understandable, i.e. properly numbered, for the catalog reader. This often means that sparking visual interest takes precedence during design, and the way items are ordered within the catalog takes secondary importance so long as sufficient output formatting information is present in data files to generate the desired visual effect for the reader.

Since there's no ideal way to setup the catalog for most effective sales, the arrangement of its document files often becomes highly variable, using collections of structures which, although producing the desired visual effects, are not logically descriptive of the underlying information elements.

Electronic or On-line Catalog User and Session Support Processing

Authoring and maintenance of publishable data has historically been accomplished via simple text editors or word processors. We used SoftQuad's Author/Editor to develop the initial DTD.

Capturing the Data in SGML Format

With the advent of Standard Generalized Markup Language and SGML-sensitive editorial software like Author/Editor, however, document designers can configure and support authoring transactions by authors who generate the highly complex data forms required for modern information delivery modes like the Internet and CD-ROM. This creates an entirely new data usage environment that, while largely unfamiliar to traditional data designer, is heavily dependent on the data structure for its operation. Indeed, with the Electronic Purchasing Service and its electronic catalog, IBM creates a totally new class of information client whose needs must be considered in the design of systems and data.

Editorial requirements and the software that supports them are often singular and exacting. The design of data structures that will guide the authoring and updating process must take these requirements and their attendant software into account if the overall environment is to achieve full productivity.

Among the major data relevant characteristics of the editorial support environment:

Nonlinear processing

In non-linear processing, catalog readers typically read through the material in highly nonlinear patterns, jumping from one place in the catalog to another and back again to complete the requisition form, sometimes working with only small sections of the catalog at a time. This means that the software IBM will use to support the electronic catalog cannot assume that codes, tags or other variables resident in the data will have been accumulated when any specific processing is needed. Customers may, for instance, want to compute their subtotals when they are still searching the catalog.

Jumping into the middle of an electronic catalog, the catalog reader may need to know information that is not immediately displayed on-screen. When this occurs, IBM must use SGML coding to determine or infer the exact nature of the catalog or service's structure to call the proper routines. If all elements that might require unique processing are tagged uniquely, the SGML software can identify them without unduly complex inference or upstream searching, thereby reducing system design costs significantly.

Software Involvement in the Customer's Use

Unlike paper pages or fixed pages, an electronic catalog requires that software stays involved with both data and user though final completion of every action associated with the data. In any electronic delivery environment, access to both data and data usage resources is available only through software. Put in simple terms, IBM's design task was not complete until the company considered every action related to final consumption of the data. Among the things we had to consider: How would the vendors and customers use the data? IBM marketing will stress that the data is ultimately recyclable, so both vendors and customers can use this data for tracking purchasing history, trend spotting, etc.

Unpredictable Processing Patterns

Unlike the latest Tom Clancy novel, which readers navigate from front to back, use of a retail product catalog does not follow a predictable linear path. The Electronic Purchasing System catalogs had to be designed to address customers accessing and using data in a hop scotch or random pattern and also had to be made flexible enough to accommodate different processes for the same data, depending on what the vendor wants. Whatever the nature of the processing, document data is the major ingredient and variable. We discovered that if we did not define our data structures precisely at the onset of the project, a high level of processing unpredictability would make it difficult for the programmers to develop and integrate support into the Electronic Purchasing System catalogs.

If software must search and collect additional information about an element or structure in order to identify it or process it properly, we call that element "structurally ambiguous." While a certain amount of this condition is unavoidable, IBM can minimize it by rigorous data design that avoids structural ambiguity wherever possible in the EPS system.

For example, if the catalog is directed to a manufacturing part number data element that requires different processing for different uses but contains no differentiating parameters, then the element is structurally ambiguous. Faced with this problem, the software programmer must attempt to solve it by writing a routine to search the data around the tag to ascertain its context and appropriate treatment. The more complex and far-reaching the required search, the more money IBM wastes. If designers implement a few rules for system design, however, IBM can drastically reduce development costs and time.

CD-ROM Mastering and Delivery over the Internet

We added new dimensions to the service, CD-ROM options and Internet options, to make EPS even more attractive to our customer base. Interactive delivery of the IBM EPS service freed the user from having to navigate through cumbersome binders and limitless paper catalogs.

However, along with the special capabilities IBM built into the service, comes a new series of considerations. Some of these considerations have an impact on catalog and service design, others on authoring and preparation.

Hardware Involvement on the User Side

Since we know that in the delivery of electronic media, the user hardware plays an important part in determining the optimum delivery mode, we knew it would also influence key catalog and service design features. In the initial stages, as designers worked out information access, grouping and display factors, designers also considered user device characteristics such as processor speed, screen size and resolution, and storage device speed. We decided on a minimum processor of a 486 PS/2, running OS/2.

Heavy Penalty for Linear Navigation of Information

Electronic display devices are limited: they cannot display as much data as the typical paper page; do not display material at the same level of resolution; and are rather slow in sequential browsing from screen to screen. Therefore our document designers had to provide the necessary data links to allow users to go directly to what they wish to see, whether from index-to-content or content-to-related-content. While the former of these two paths can be accommodated by linking an external index to entry points in the data content, the latter, content-to-related-content is trickier.

Having entered the catalog at the most detailed subject level, the table of contents, customers will usually find what they actually want to read doesn't display with the opening screen, forcing them to scroll through the file, to the appropriate point: the group of items they want to purchase. Since most display screens can manage only 2,000 or so characters versus a typical paper page which can hold 5,000 or so characters, this can prove very tedious indeed. Factor in a slow transmission rate or a high CD-ROM seek time and minimum hardware configuration, and the process slows to a crawl.

This was a problem with the initial electronic catalog. Navigation was extremely linear, and provided by the rigid taxonomy. To avoid the usage penalty associated with linear navigation, data design provided the basis for catalog scrolling without extensive browsing. Keeping the material as linear as possible solved some of the problems. Any large horizontal data structures such as tables were redesigned as vertical, list-like structures with a machine-generated internal table of contents-like structure attached to the top-level entry element. An example: Automobile -->4DR--->Chrysler. This type of data structure, with links between entry-point and each stub element made the delivery software capable of displaying a list of stub values to a user upon catalog boot up, so, for the customer, jumping to the appropriate place in the catalog was easy.

On-line Service Delivery

When considering the aspects of on-line delivery of the Electronic Purchasing Service, IBM had a number of associated factors to consider here, as well. Electronic Data Interchange has its own set of standards.

Maximum Addressibility with Nesting of Data

When designing the catalog and on-line service, designers made every effort to nest logical structures, ensuring that software used to process and deliver the data could easily identify and access logical data structures in a single action. Sibling representation of nested data elements is workable in a composition environment, but less workable for the Electronic Purchasing System. To use sibling relationships extensively would seriously complicate delivery support software and limit available functionality.

Avoiding Use of Defaults

To simplify the task of developing and maintaining the manufacturing DTD for the electronic catalogs, designers constructed many different logical structures from basic elements, such as product descriptions or blurbs.

Avoiding Representing Data and Display Structure as Tables

Data intended for a paper catalog is typically represented in a table. Invoices, Purchase Orders, Requisitions and the like are typically in tabular format. In electronic environments, disparate systems can wreak havoc on this type of data. Designers explored every alternative to tabular formatting.

Originally, much of the work on the electronic catalog was done in GML and Bookmaster. However, since most of the files provided by the catalog team were created in GML, SGML's precursor, a conversion program was needed to convert the GML files to SGML files.

After the original DTD was originally developed, it was decided that this tactic was going to be too complicated for the EPS project. Essentially, we were locking ourselves into a format in which a separate DTD was going to be necessary for each catalog. What we needed was a more universal format, a DTD that could be used for each of our customers.

We decided, with the Internet being proffered as an option to customers, that we change our tactics dramatically. We decided instead to use the Hypertext Markup Language; specifically the HTML 2.0 specification, as our universal DTD.

Using HTML

The HyperText Markup Language (HTML) is a simple markup language used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of applications. HTML markup accommodated all our needs, since it could handle: all text used in the product descriptions, manufacturer part numbers and customer account numbers; our menus of options; database query results; simple structured catalog pages with in-line graphics and hypertext views of entire catalogs of information. We settled on HTML 2.0 because its level of sophistication is compatible with our WebExplorer product.

Creating A Simulation on the WWW

As development work on the application progressed in our Toronto office, the Thornwood facility was charged with developing a simulation program. To obtain a signoff from a customer or content provider without access to the EPS application, we created a simulation. The simulation's look and feel was required to match the look and feel of the actual application as closely as possible. To ensure the workability of our simulation, we used a customer-provided database to create catalog content.

The WWW catalog consists of three levels of pages: category navigation pages, category list pages, and detail pages. This simulation does not exist in isolation as there are other medium for delivery to the content provider including CD-ROM and Lotus Notes.

The Category Navigation pages allow the user to navigate the taxonomy of categories. These pages are mainly static and are not generated by the database. The Category list pages list all of the items that are in a particular category. These pages are dynamically created by a database report. There is one detail page for every item in the catalog.

Template category list and detail pages were first generated to establish the look and feel. The changeable elements were substituted with variables. Then the pages were loaded in the database report generator. A report was generated with all of the detail pages combined together. A separate program was then used to break out all of the separate files.

Catalog item Images were available in the formats needed for the EPS. These images were converted over to GIF and JPEG depending on their size.

The base process for the catalog creation took a team of two people a day to put together. One team member developed the look and feel of the catalog while the other filled in the catalog content from the database.

Managing The Information with SGML Tools

To formulate the original DTD, we used Author/Editor by SoftQuad. IBM's Product Information Workbench was used to build catalog content. No HTML specific tools such as HTML editors, SGML parsers, were necessary in the later stages, since catalog content data was provided in database format.

Putting it to Work

Work on the EPS catalog and definition of the electronic purchasing service continues, but in February we announced our first customer for the service: Coopers & Lybrand. One of the world's leading professional firms, Coopers & Lybrand provides services for enterprises in a wide range of industries. The firm offers its clients the expertise of more than 16,000 professionals and staff in offices located in 100 U.S. cities and, through the member firms of Coopers & Lybrand International, more than 66,000 people in 125 countries worldwide.

The Coopers & Lybrand pilot installation, initially operational in the metropolitan New York region, is expected to reduce the cost of processing a single corporate purchasing order up to 60 percent.


With HTML, we can use the information superhighway to solve real business problems such as re- engineering the purchasing process. Since HTML allows us endless customization opportunities, we've been able to take a truly customer-driven approach to attacking a problem which costs corporations significant amounts of time and money. Our new Electronic Purchasing Service promises to redefine how companies buy and sell products and also puts IBM at the forefront of the emerging electronic commerce arena.

About Ellen Adams

An award-winning technical writer, Ellen Adams has worked with SGML since 1993, when she helped define Alliance Technologies' SGML-based Data Fusion and Extraction System. Currently working in the Electronic Catalog Creation department of IBM's Network Application Services division, she provides SGML and HTML expertise to the department, and maintains departmental files broadcast on the WWW.