An Introduction to SGML

Published by Auto-Graphics, Inc., 3201 Temple Ave., Pomona, CA 91768-3200.

Telephone: 909-595-7204 | 800-776-6939

Fax: 909-595-3506


Computers have been singularly responsible for the explosion of what has been called the "Information Age". Ironically, the one area in which computers continue to be deficient is in the area of reliable, easy, platform-transparent, machine-and user-independent information exchange.

In the publishing world, the problem is compounded by the fact that often several contributors are involved in the authoring of a document. The same is generally true in corporations who regularly complete requests for proposals, particularly RFPs from the federal government. Not surprisingly, the Internal Revenue Service and the Department of Defense were among the first champions of a new standard in document structuring that would eliminate many of these difficulties. That standard was SGML--Standard Generalized Markup Language.

Many publishers and publishing organizations have embraced SGML as the solution to accurate and consistent information structuring and exchange, and their numbers are growing. Now the corporate world is looking to SGML as a new-found solution for maintaining corporate standards, and SGML structuring is at the heart of many CD-ROM reference works.

This document will give you an overview of SGML, what it means, how it works, and the benefits it offers. After you've read this publication, we urge you to talk with the SGML expert--Auto-Graphics. We know SGML. And we know how to help you take advantage of this exciting innovation in information management, structuring, and exchange.

What Is SGML?

SGML stands for Standard Generalized Markup Language.

It was introduced in 1986 as an international standard for structuring information for electronic exchange. Yet since its introduction, SGML has evolved to encompass precise management and control of the structure and function of document elements.

SGML is not a software program. It is a language standard for defining parts of a document and, as appropriate, the database information on which the document is based.

How Does SGML Work?

SGML uses a tagging scheme defined in what are called DTDs--Document Type Definitions. A DTD specifies the nature of the text or data to be included in the document or database, and it defines the hierarchical structure of the document.

For example, a technical manual may include headings, subheadings, descriptive paragraphs, step-by-step instructions, and footnotes. Each of these elements would have a separate and distinct tag in the DTD.

To specify a subheading, the writer would first enter the proper tag or "leader" tag, then the actual subheading text, and then the "closing" tag. When the work is authored, this subhead will automatically be validated, that is, in the appropriate order and placement according to the DTD.

So SGML Is Like a Style Sheet or Template in a Word Processing Program?

No. Again, SGML is a language standard, not a program.

A DTD, on the other hand, is somewhat similar to a word processor's style sheet or template in that it provides tags for different document elements. But that is where the similarity ends--because a DTD defines only the structure and organization of a document's contents, not the stylistic elements such as typefaces, font sizes, line spacing, and so forth.

Furthermore, even the most sophisticated word processing programs limit the number of elements or style tags you can have in a style sheet or template, generally less than 100. On the other hand, one of the primary applications for SGML is long, complex documents with hundreds of DTD tags. In fact, Auto-Graphics has written SGML DTDs that contain over 1,400 unique tags and 14 levels of hierarchy.

If an SGML Structure Does Not Control Typefaces or Font Sizes, Why Is SGML of Value?

First of all, SGML establishes an organizational structure that remains consistent regardless of the number of people who contribute to the document.

Go back to our example of the technical manual and assume that 12 writers work on the project. With SGML, every writer uses the same DTD, so a subhead is always a subhead, let's say, and a descriptive paragraph is always a descriptive paragraph. None of the writers can change the characteristics of the structure.

Secondly, when an SGML-based document is transmitted or handed off, the document necessarily includes the DTD. Thus, when the document is opened by someone else, the structure of the data is retained. The operating platform doesn't matter, provided the original "source" application is compatible with the "destination" platform. An SGML application run on a Macintosh, for example, will not necessarily run under UNIX. However, becauseSGML involves no character formatting, the SGML data is effectively platform-transparent and machine-independent.

Thirdly, SGML is particularly ideal for documents that are built from databases, record oriented, and subject to periodic review and updating. Once you create the SGML-structured database, any changes to the database are immediately implemented in the document. Reference publications and product catalogs are typical applications for SGML.

Finally, SGML provides the most efficient means for information management, retrieval, and selective data access and re-publication.

For instance, Auto-Graphics recently completed conversion of the entire typesetting file for Columbia University Press's Granger's Index to Poetry to an SGML-structured database and then produced this data in both print and on CD-ROM. This allowed Columbia to easily publish additional spin-off books--including Harmon's Top 100 Poems and Harmon's Top 500 Poems--without having to re-enter data.

Is There Basically Only One SGML?

Again, the answer is yes and no. There is only one essential SGML "concept". But the SGML structure (per the DTD) is variable and purely dependent on the type of document being controlled by SGML and/or the overall application.

Consider the Computer-aided Aquisition and Logistics Support (CALS) initiative from the Department of Defense (DOD). Introduced in 1987, CALS required defense contractors to use SGML in the preparation and exchange of technical documentation, information, and other electronic and hard-copy materials.

SGML also has been adopted by the American Association of Publishers (AAP) to both facilitate easier exchange of information and assure consistent document structure and organization.

Keep in mind that SGML is not an off-the-shelf software product that you can buy like a spreadsheet or word processing program. Pre-written DTDs are available in the public domain, and organizations such as the DOD and AAP will provide their own DTDs to suppliers and contributors. But SGML is different--it's a language, not a program.

The fact is, there is only one basic SGML language standard, yet there are numerous SGML applications, each with its own set of DTDs.

How Difficult Is It to Implement SGML in an Organization?

The SGML protocol is strictly defined with universal codes and its own particular vocabulary. Learning to use and work with SGML is much like learning to use any new computer language or sophisticated program.

Installing and implementing SGML in an organization is another story--especially if you plan to include SGML-structured databases. This is true not only for publishers, but for any business wishing to take advantage of the many benefits SGML has to offer.

Unlike "shrink-wrapped" computer programs, SGML was not originally intended for "do-it-yourself" installation and set-up. However, companies such as Auto-Graphics have been committed to providing the proper training and tools to help SGML users learn the language and eventually develop the skills to effect installation, set-up, and implementation on their own. Auto-Graphics in particular has taken a leadership role in this quest, providing both classroom and on-line SGML educational courses and tutorials.

Nevertheless, converting to SGML is, in many ways, tantamount to writing the code for controlling your entire computer operation. It requires professional experience and knowledge, careful attention to the publishing and database needs of your organization, dedicated time, and considerable hands-on training. SGML is not something you can learn merely by reading through the documentation.

Most reputable SGML experts will recommend that you first utilize the services of a qualified SGML specialist rather than attempt the conversion yourself--if only to obtain professional education in SGML. Such specialists, (particularly Auto-Graphics), can help you determine whether or not SGML is appropriate, evaluate your specific SGML needs, and point you in the right direction for effecting conversion, installation, set-up, and training with minimized impact on your day-to-day operations or on your people.

Then If I Convert to SGML, Do I Have to Go Back to My Supplier When I Need to Make Changes?

As far as Auto-Graphics is concerned, absolutely not. We can include our Smart Editor(TM), an SGML database editing tool, with every SGML installation so you can customize your SGML coding structure to suit every application.

Not all SGML suppliers provide an editing tool, requiring you to contact them for additional service whenever you need to make a change. Be sure to ask about SGML editing before you make a commitment. If your supplier does not offer an editing utility, Auto-Graphics' Smart Editor(TM) will work just fine. Smart Editor(TM) is a powerful tool for assembling and managing SGML databases for record-oriented applications. If your SGML supplier does offer an editing utility, look carefully at how the editor works. Every SGML editor will give you the ability to customize your SGML databases and structures, but many only allow you to use the actual DTD tag names for database fields. For the uninitiated or infrequent user, the arcane nature of the SGML codes can be intimidating and may impede productivity.

Auto-Graphics' Smart Editor(TM) simplifies the process by using "plain English" titles--not DTD tag names--for fields. This is of little importance if you're working with database records that have only four or five fields. But if you're trying to edit or key-enter in a record that has 200 or 300 fields, this straight forward, user-simple approach is invaluable.

What Kind of Computer System Do I Need for SGML?

Virtually any kind of system you prefer--from stand-alone PCs and Macintosh computers, to LAN systems to Mainframes... for DOS, Windows, Apple System 7, OS/2, UNIX, and more.

You see, SGML is not platform-or machine-dependent. It is, instead, extremely independent. Remember, an SGML DTD does not determine stylistic formatting. It defines and controls structure and organization. Thus, the exchange of information within your business or between your business and an SGML-compliant outside operation or individual system does not require pre-transmission conversion to plain ASCII, plain text, or other "stripped" file formats.

Isn't SGML Really Only Applicable to Publishing?

That depends on how you define "publishing". If you mean only the publication of books, catalogs, directories and the like for public consumption, the answer is no. If you mean the publication of information in both printed and electronic form for all types of "publics", the answer is a resounding yes.

While it is true that the traditional publishing world is turning to SGML, so are electronic publishers and corporate users in general.

SGML, particularly with a powerful search and retrieval program like Auto-Graphics' IMPACT(TM), serves as the essential foundation for most CD-ROM reference publications. In fact, the versatile, efficient, fast search and retrieval capabilities inherent in the SGML structure are one of the real beauties of the SGML concept.

To that end, more and more corporations are realizing the tremendous economies and efficiencies of distributing product catalogs, price lists and the like on SGML-structured CD-ROMs instead of in hard copy. The cost difference alone can total in the hundreds of thousands of dollars for just the initial release.

Even more attractive to these corporate users is the advanced power and performance of today's emerging search and retrieval engines. Auto-Graphics' IMPACT(TM) software is a prime example.

Like other packages, IMPACT(TM) capitalizes on SGML's facility for rapid search and retrieval. However, IMPACT(TM) also allows for easy updating of CD-ROM files without requiring re-mastering. A parts catalog produced on CD-ROM, for example, can be updated periodically by issuing the information on inexpensive diskettes. Once the new information is loaded onto the hard disk drive, IMPACT(TM) will automatically default to the new data during subsequent searches and retrievals.

So How Do I Decide Whether or Not SGML Is for Me?

The rule of thumb is this: If you're producing large, complex documents that require periodic review and updating, or if you're producing publications that need to follow the same structural format from issue to issue, SGML is definitely worth considering.

An even easier answer is to call Auto-Graphics.

As SGML specialists, we can help you evaluate your situation and determine the best approach. It might be that SGML is not right for your application. If that's the case, we'll tell you.

But if SGML should be in your future, we can assist you in deciding which SGML product is right for you--even if it happens to be an SGML product from someone other than Auto-Graphics. Naturally, if our SGML product is the best fit for your needs and your organization, we can then help you in making the conversion and getting up and running quickly and productively.

SGML can be challenging. But with Auto-Graphics' in-depth experience and knowledge, you can overcome the challenges... and take full advantage of the benefits SGML has to offer.

We look forward to hearing from you.

Return to Publishing home page.