WordPerfect(R) 6.1 for Windows(TM) SGML Edition

White Paper

Introduction

One of the biggest challenges facing organizations today is information management. Advancements in technology have produced an abundance of information. But unless this information can be properly manager, it is of little use. An organizations success depends on its ability to manage this information. To partially solve this problem, most companies depend on information management systems (databases) for their financial, customer, and personnel information. However, much of today's information is document based, rather than data based. Information management in a document based-environment is a painstaking procedure; because it has not yet been automated, it can be viewed as the next frontier of information management. Traditional problems with document-based information management include:

Benefits of SGML

SGML solves the three problems listed above. SGML is based on ASCII, which is the lowest common denominator for computer text exchange. Virtually all applications can read ASCII files, meaning that file format is no longer a problem to document creation. Pieces of information from different sources can easily be assembled because they are in the same format. Furthermore, because virtually all systems can read ASCII, SGML is system independent. To move an SGML file from machine to machine does not require specific hardware, software, or operating systems. Files can be easily moved, for example, from DOS to Unix, or Windows to Mac.

Its founders understood that document format would always present a problem and designed SGML to remove the format from the content and structure of a document. Because SGML preserves document structure, the layout and format can be automated. This means that pieces of information from different sources can be assembled, after which format and layout will be added automatically.

As stated in the previous paragraph, SGML maintains the structure of the document. Specifically, the elements of a document such as titles, headings, and paragraphs, are tagged within the document. Any element of a document is then electronically visible and can be treated as a single datapoint. Information can be stored in a database, an infobase, an online help system, or a publishing tool without having to reformat each time or reconfigure the data for different systems. The document is no longer a dead end; information can now be extracted from it and stored in a database. Furthermore, if an organization has a coordinated SGML system, it can make real-time edits to all electronic information on its system. Because of the structure of SGML, it is possible to link a document to the data that created it.

SGML requires that every piece of data is properly identified during the authoring process. Chances of the information being lost or deleted during the layout process are greatly reduced. Documents can quickly be checked to make sure that all the required information is included.

What is SGML?

The Standard Generalized Markup Language (SGML) is the International Organization for Standardization (ISO) standard for document description or more importantly "structure". SGML is specifically designed to enable text interchange and is intended primarily for use in the publishing field, but has other applications.

SGML is a symbolic language that provides a coherent and unambiguous syntax for describing whatever a user chooses to identify within a document.

The basis of SGML is the hierarchal structure of document content. The structure is controlled by a document blueprint that contains the names and accepted order of each element or section of a document. The blueprint is called the Document Type Definition (DTD) and is stored in an ASCII file.

Like a database structure, SGML groups congruent information into singular fields. Each unique field or element within an SGML file is identified by SGML tags. An opening tag (example: <chapter>) is inserted in front of the element, and a closing tag, when needed, (example: </chapter>) is inserted at the end of the element, electronically storing the name of that element. SGML tags are in some ways similar to bold codes in WordPerfect. The turn-bold-on code is placed in front of the text to be bolded, and the turn-bold-off code is placed at the end. The difference is a WordPerfect code dictates text format while an SGML tag identifies the text. Unlike WordPerfect tags, however, SGML tags describe only the document elements, rather than how text is to be formatted. The following example is an example of a SGML memo. (Carriage returns were added at the end of each line for readability. SGML can be one long stream of characters.)

               <memo>
               <address>To: Frank Brown</address>
               <sender>From: Area managers </sender>
               <date>Date: June 24, 1993</date>
               <subject>Re: Salaries</subject>
The use of 'memo,' 'date,' and 'address,' in this example is defined by the creator of this document type definition (DTD). The syntax used above (<tag_name>,</tag_name>) is also application specific. Basically, the creator can vary the characters used to delimit tags and optionally omit start or end tags. This allows rapid and efficient keyboard entry of markup, where specialized SGML editors are not available.

How does HTML relate to SGML?

Hypertext Markup Language (HTML), is essentially a DTD, or a subset of SGML. Like any DTD, the HTML DTD specifies the open and close codes for individual data elements, and the order in which each element may appear. HTML was established to take advantage of the benefits of SGML, and to incorporate hypertext links within and across documents. HTML is the document standard for the World-Wide Web, which is quickly becoming the standard information storage and exchange mechanism for large organizations such as business and government.

The SGML Market

According to Interconsult, the leading SGML market research firm, the SGML market, including all software, hardware, conversion and integration services, is experiencing an annual growth rate of 34% per year. The market, reaching $668 million in 1994, is forecasted to reach $1.46 billion in 1998. This explosive growth is due to several key industry initiatives to standardize on SGML and the emergence of the World-Wide Web, which is standardizing on HTML.

Although the dollar amount of the market seems impressive, over half was spent in integration and conversion services (54%). These are actually non-automated services performed by SGML experts. The size of this entire service industry illustrates the immaturity of the SGML market.

As a percentage of total tagging, conversion, and authoring software, the automated part of the SGML market was fairly small (9.2%, or $56.2 million). Authoring software alone was a mere 9.2% or $48 million. As the market matures, this split will move in the other direction -- more money will be spent on software and less on services.

With the emergence of the World-Wide Web as a primary means of electronic document delivery, progress in several key industry initiatives to standardize on SGML in 1993, and the entrance of key players in the SGML market such as Microsoft and Novell, the adoption of SGML is on the verge of explosion. A list of current users and key industry initiatives appears in Appendix A.

Problems w/ SGML Adoption

In spite of the advantages that SGML offers, its acceptance has been slow. There are basically two reasons for SGML's slow adoption. The first is the lack of tools that bring SGML to the masses. Specifically, tools that don't require the user to be an SGML expert. The second, and equally important, is the significant cost of producing and maintaining SGML documents. As the above expenditure figures showed, significant resources are involved in the SGML service industry. Logic would suggest that automation could save much of this cost. Unfortunately, the cost of SGML systems is still prohibitive. The costs associated with automated SGML solutions include:

Only this year has the market seen some promise of a solution to the SGML problem. Both Novell and Microsoft have announced SGML authoring systems incorporated into their standard word processors. To be successful, such systems should accomplish the following:

WordPerfect's Involvement in the SGML Environment

For over a decade, WordPerfect has been the world-wide standard for word processing. With over 17-million users, at least one copy of WordPerfect is present in every organization. As technology changes, WordPerfect has supported such changes to give users the most advanced technology in document processing. The recent release of WordPerfect 6.1 for Windows is a testament to this philosophy. The following are excerpts from the press:

"In our view, WordPerfect 6.1 is the strongest of the three [Windows word processors]. WordPerfect has always been laden with features, but this latest version makes significant progress in making these features simple to use. ...WordPerfect for Windows lacks some of the sophisticated document filing capabilities of WordPerfect, which could make finding documents more of an ordeal on a complex system."

Business Consumer Guide, December 1994

"WordPerfect 6.1 is a coup d'etat over Microsoft's Word and Lotus's Ami Pro. ...You'll find many improvements in WordPerfect 6.1 that you didn't think were possible. ...Usability is just about as good as it gets. ...An unparalleled combination of power and ease of use."

Five-star rating in the November issue of PC/Computing

"PerfectSense is the first breakthrough in editing that I've seen in a long time," said Jeffrey Tarter, editor of Soft*letter in Watertown, Mass. "WordPerfect's a good two years ahead of the competition."

Quoted in PC Week, August, 15, 1994

Over three years ago, WordPerfect Corporation saw that electronic document delivery, which requires non-proprietary file formats, was the wave of the future and formed an electronic publishing team. This team is comprised of electronic publishing tools experts who have studied market research, reviewed customer feedback, and developed the strategy and tools for bringing electronic publishing and document interchange to the mainstream. The electronic publishing tools group has been involved in various consortiums that explore electronic document storage and delivery such as SGML Open and the ODA Consortium. WordPerfect Corporation has actively supported non-proprietary file formats (e.g., SGML and ODA) and has released electronic publishing tools (e.g., Intellitag, Envoy, ConvertPerfect/ODA) that make electronic document delivery a reality.

SGML Edition of WordPerfect

The SGML Edition of WordPerfect 6.1 for Windows incorporates a complete SGML solution into WordPerfect 6.1 for Windows. This solution provides a complete SGML authoring tool that requires little or no training, but delivers a competitively priced shrink-wrapped application. With SGML Edition of WordPerfect, users will need to learn no esoteric SGML authoring system, but can use the word processing environment they know. Users will also not need to employ SGML experts to install and setup their SGML systems, reducing the cost of SGML document creation and maintenance.

The SGML Edition also provides a complete SGML solution. Users no longer need to purchase separate pieces such as auto-tagging, validation, authoring and conversion software. The SGML Edition provides all the necessary pieces in one product, from DTD creation to validation.

Specific features of WordPerfect 6.1 SGML Edition are explained below.

WordPerfect 6.1

The most important thing to remember about the SGML Edition is that it is WordPerfect. All of the functionality contained in WordPerfect 6.1 is contained in the SGML Edition including its newest and most exciting features such as PerfectSense, QuickTasks and Coaches.

Using many WordPerfect functions, the user can make the SGML authoring/editing process much easier. For example, the user can build his/her own QuickTasks and Coaches to assist with creating and modifying documents. Using the macro facilities of WordPerfect 6.1, the user can create "pre-tagging" rules to automate the tagging process and have WordPerfect do a lot of the work. The macro feature also enables the user to run these tagging processes in batch mode.

SGML capability has also been incorporated into WordPerfect's macro language. This gives the user the ability to write macros that can understand and use the SGML structure. For example, tag attributes can be added, changed or deleted through the macro language.

Layout Designer

Because an SGML document is only content and structure, any number of formats can be applied to an SGML document. The Layout Designer is an intuitive part of the SGML Edition that enables the user to create custom layouts that can be applied to any Document Type Definition (DTD). For example, formatting commands such as center or bold can be assigned to a <title> tag, or indent to a <paragraph> tag. Such formats can be applied to every element of a DTD.

As stated above, SGML functionality has been incorporated into the WordPerfect macro language. In the context of the Layout Designer, this means that specific macros can be run upon insertion of a tag. For example, each time a heading tag is inserted, a macro can be run that would assign it a certain level heading for table of contents generation.

Because SGML is often more complex than these examples, other features of the Layout Designer include:

Document Type Definition (DTD) Support

No two organizations or documents are exactly the same. That's why SGML Edition provides the option to create user-defined DTDs specific to the needs of each organization or to use one of the many existing pre-defined DTDs. SGML Edition supports numerous pre-defined DTDs. The following is a list of pre-defined DTDs that exist today and can be used with or without modification. These DTDs also ship with the SGML Edition.

Industry             Application            Initiative
Defense Maintenance  contract IETM          CALS
Aerospace            Maintenance            ATA-100
Automotive           Service                J2008
Computer & Hardware  Online help            COSE
  & Software           User documentation     Davenport
European Aerospace   AECMA documentation    ISO, DIN
Financial            SEC filings            Edgar
Pharmaceutical       New drug applications  CANDA
Telecommunications   Repair                 TCIF
Electronics          User documentation     COSE
                                              Davenport
Publishing           Documents              AAP
Manufacturing        Quality organizations  ISO 9000
Semiconductor        Component catalogs     Pinnacles
Internet             World-Wide Web         HTML

Interactive Validation and Error Reporting

To help the user in authoring an SGML document, the SGML Edition coaches the user through the DTD. As the user completes a tag, the Edition will suggest the next tag based on the DTD. In cases where any number of tags could come next, the Edition will make a guess based on what has already been done. This process is called logic chaining, and makes the process easier for a user who does not have a detailed understanding of the DTD.

If the user is tagging an existing document for SGML output, an interactive validation feature walks the user through the tagging process. As determined by the DTD, the validator will indicate what tag should be applied at any point in the document.

At any point of document creation or tagging, the user can ask for an error report. The Edition will check the document against the DTD, and will list the errors in a pop-up window. An error occurs when a tag is missing or is improperly placed in the document.

Alias Support

A DTD often contains tags that are not identifiable by name to the user. This is particularly true of HTML which assigns its tags very cryptic names. The SGML Edition enables the user to assign user-friendly names to the tags that are included in the DTD. With the Alias Support feature, the user doesn't need to remember that <H1> is the tag for first level headings because it can be assigned a descriptive name such as Heading1. This feature further helps the user who did not write the DTD or who does not have a detailed understanding of it.

Enhanced Extended Character Handling

If a DTD calls for extended characters to be displayed in SGML format, SGML Edition will translate the string in the DTD to the single character while working on the document in WordPerfect 6.1. When the document is saved in SGML format, the extended character string is preserved in the SGML file.

Table Tagging

The SGML Edition supports automatic table tagging. The standard table formats supported include

CALS, AAP, WP5.x and WP 6.x.

Conclusion

The SGML Edition is just the first step in Novell's commitment to deliver high-level tools for electronic document storage and exchange to the masses. Because of the complexity of SGML and the tools available, it is one of the areas that most computer users have been locked out of. The SGML Edition finally makes SGML easy and affordable enough for even the average computer user.

Novell's commitment to electronic publishing is part of its overall goal of Pervasive Computing; ...connecting people with other people and the information they need, giving them the power to act on that information -- anytime, anyplace.

Appendix A

Who Uses SGML?

SGML is popular among document-intensive organizations. Governments, publishing houses, pharmaceutical companies, and large manufacturers all use SGML regularly. Other groups including law professionals, libraries, and universities are now looking for a document standard to facilitate electronic publishing. Industry analysts predict that over the next several years, most major organizations will switch to a non-proprietary standard for information management.

Specifically, some of the organizations that are implementing SGML today include:

The United States Department of Defense (DoD)

The United States Department of Energy (DoE)

The American Association of Publishers (AAP)

The Telecommunications Industry Forum (TCIF)

The Air Transportation Association (ATA)

The United States Patent Office

The Environmental Protection Agency (EPA)

The Utah Court System

Key Industry Initiatives

Several industry initiatives are also in the works for evaluating SGML as a solution for technical documentation problems. Some of these are listed below.

The CALS Initiative

CALS (Continuous Acquisition and Life cycle Support) is a U.S. Department of Defense initiative to acquire and manage technical information. CALS defines standards for storage and transfer of documents in digital format. One of the key CALS objectives is that documents must be independent of special word processing systems. To accomplish this, CALS specifies the international SGML for text documents. Other requirements of the CALS initiative include specific, non-proprietary formats for graphics, video, and audio files. By requiring electronic information in pre-defined digital formats, the Department of Defense now saves millions of dollars and hundreds of hours normally reserved for document conversion and formatting.

The Text Encoding Initiative (TEI)

TEI is using SGML electronic delivery to improve the ability of humanities and linguistics researchers to perform searching and annotation of relevant texts.

The World Wide Web (WWW)

The WWW is standardizing on HTML to assist in accessing and navigating through information on the Internet. This initiative will likely never adopt full SGML.

SuperJournal/Super Janet

The SuperJournal project of the British library resulted in a collaboration between nine academic journal publishers to evaluate a network for delivering SGML-encoded electronic journal articles.

Reg Sage Project

The University of California at San Francisco Reg Sage project is a collaboration with Springer Verlan and AT&T to deliver medical and scientific journals online to the university library. The project began in January 1994 and runs through the end of 1997.

The Pinnacles Group

The Pinnacles Group is a consortium of electronic component manufacturers -- Hitachi, Intel National Semiconductor, Philips, and Texas Instruments -- aimed at delivering product data sheets online. The Pinnacles Group contends that current printed and bound volumes of datasheets are difficult and time-consuming for engineers to search and that the engineer must re-type component parameters into the ECAD system.