WordPerfect 6.1 for Windows SGML Edition
[Mirrored from: http://wp.novell.com/busapps/win/mksgml.htm, or http://netwire.novell.com/wp/busapps/epub/sgmled.htm]
White Paper
Introduction
One of the biggest challenges facing organizations today is information management.
Advancements in technology have produced an abundance of information. But unless this
information can be properly managed, it is of little use. An organizations success depends on its
ability to manage this information. To partially solve this problem, most companies depend on
information management systems (databases) for their financial, customer, and personnel
information. However, much of todays information is document based, rather than data based.
Information management in a document-based environment is a painstaking procedure; because it
has not yet been automated, it can be viewed as the next frontier of information management.
Traditional problems with document-based information management include:
- Disparate File Formats -- seldom, if ever, are all the pieces of information available in
compatible file formats, which makes information difficult to assemble.
- Disparate Formatting -- even if separate pieces of information are in the same file format,
the format of the respective pieces and the final document must be changed for
reproduction and distribution. Formatting and reformatting documents requires significant
time and effort. The U.S. government alone estimates that it spends $5 billion on
document conversion per year.
- No Link from Document to Information -- a document represents a dead-end in the flow
of information because there is no link to the information that created it. This means that
the information must be reassembled and reformatted numerous times, as new and updated
documents are created.
Because of the problems listed above, industries are developing and implementing standards for
storing and exchanging technical information. Many of these efforts look to SGML to solve these
problems.
Benefits of SGML
SGML solves the three problems listed above. SGML is based on ASCII, which is the lowest
common denominator for computer text exchange. Virtually all applications can read ASCII files,
meaning that file format is no longer a problem to document creation. Pieces of information from
different sources can easily be assembled because they are in the same format. Furthermore,
because virtually all systems can read ASCII, SGML is system independent. To move an SGML
file from machine to machine does not require specific hardware, software, or operating systems.
Files can be easily moved, for example, from DOS to Unix, or Windows to Mac.
Its founders understood that document format would always present a problem and designed
SGML to remove the format from the content and structure of a document. Because SGML
preserves document structure, the layout and format can be automated. This means that pieces of
information from different sources can be assembled, after which format and layout will be added
automatically.
As stated in the previous paragraph, SGML maintains the structure of the document. Specifically,
the elements of a document such as titles, headings, and paragraphs, are tagged within the
document. Any element of a document is then electronically visible and can be treated as a single
data point. Information can be stored in a database, an infobase, an online help system, or a
publishing tool without having to reformat each time or reconfigure the data for different systems.
The document is no longer a dead end; information can now be extracted from it and stored in a
database. Furthermore, if an organization has a coordinated SGML system, it can make real-time
edits to all electronic information on its system. Because of the structure of SGML, it is possible
to link a document to the data that created it.
SGML requires that every piece of data is properly identified during the authoring process.
Chances of the information being lost or deleted during the layout process are greatly reduced.
Documents can quickly be checked to make sure that all the required information is included.
What is SGML?
The Standard Generalized Markup Language (SGML) is the International Organization for
Standardization (ISO) standard for document description or more importantly "structure.
SGML is specifically designed to enable text interchange and is intended primarily for use in the
publishing field, but has other applications.
SGML is a symbolic language that provides a coherent and unambiguous syntax for describing
whatever a user chooses to identify within a document.
The basis of SGML is the hierarchal structure of document content. The structure is controlled
by a document blueprint that contains the names and accepted order of each element or section of
a document. The blueprint is called the Document Type Definition (DTD) and is stored in an
ASCII file.
Like a database structure, SGML groups congruent information into singular fields. Each unique
field or element within an SGML file is identified by SGML tags. An opening tag (example:
<chapter>) is inserted in front of the element, and a closing tag, when needed, (example:
</chapter>) is inserted at the end of the element, electronically storing the name of that element.
SGML tags are in some ways similar to bold codes in WordPerfect. The turn-bold-on code is
placed in front of the text to be bolded, and the turn-bold-off code is placed at the end. The
difference is a WordPerfect code dictates text format while an SGML tag identifies the text.
Unlike WordPerfect tags, however, SGML tags describe only the document elements, rather than
how text is to be formatted. The following example is an example of an SGML memo. (Carriage
returns were added at the end of each line for readability. SGML can be one long stream of
characters.)
<memo>
<address>To: Frank Brown</address>
<sender>From: Area managers </sender>
<date>Date: June 24, 1993</date>
<subject>Re: Salaries</subject>
The use of memo, date, and address, in this example is defined by the creator of this
document type definition (DTD). The syntax used above (<tag_name>,</tag_name>) is also
application specific. Basically, the creator can vary the characters used to delimit tags and
optionally omit start or end tags. This allows rapid and efficient keyboard entry of markup, where
specialized SGML editors are not available.
How does HTML relate to SGML?
Hypertext Markup Language (HTML), is essentially a DTD, or a subset of SGML. Like any
DTD, the HTML DTD specifies the open and close codes for individual data elements, and the
order in which each element may appear. HTML was established to take advantage of the
benefits of SGML, and to incorporate hypertext links within and across documents. HTML is the
document standard for the World-Wide Web, which is quickly becoming the standard information
storage and exchange mechanism for large organizations such as business and government.
The SGML Market
According to Interconsult, the leading SGML market research firm, the SGML market, including
all software, hardware, conversion and integration services, is experiencing an annual growth rate
of 34% per year. The market, reaching $668 million in 1994, is forecasted to reach $1.46 billion
in 1998. This explosive growth is due to several key industry initiatives to standardize on SGML
and the emergence of the World-Wide Web, which is standardizing on HTML.
Although the dollar amount of the market seems impressive, over half was spent in integration
and conversion services (54%). These are actually non-automated services performed by SGML
experts. The size of this entire service industry illustrates the immaturity of the SGML market.
As a percentage of total tagging, conversion, and authoring software, the automated part of the
SGML market was fairly small (9.2%, or $56.2 million). Authoring software alone was a mere
9.2% or $48 million. As the market matures, this split will move in the other direction -- more
money will be spent on software and less on services.
With the emergence of the World-Wide Web as a primary means of electronic document delivery,
progress in several key industry initiatives to standardize on SGML in 1993, and the entrance of
key players in the SGML market such as Microsoft and Novell, the adoption of SGML is on the
verge of explosion. A list of current users and key industry initiatives appears in Appendix A.
Problems w/ SGML Adoption
In spite of the advantages that SGML offers, its acceptance has been slow. There are basically
two reasons for SGMLs slow adoption. The first is the lack of tools that bring SGML to the
masses. Specifically, tools that dont require the user to be an SGML expert. The second, and
equally important, is the significant cost of producing and maintaining SGML documents. As the
above expenditure figures showed, significant resources are involved in the SGML service
industry. Logic would suggest that automation could save much of this cost. Unfortunately, the
cost of SGML systems is still prohibitive. The costs associated with automated SGML solutions
include:
- Software Expense -- The current offering of SGML software is still significantly more
expensive than standard office software.
- Training/Learning Expense -- Training and learning are the hidden costs of adopting new
software. Gartner Group consistently estimates training and learning costs of average
office software up to 10 times greater than the cost of the actual hardware and software.
With the added complexity of SGML, those costs will tend toward the higher end of that
estimate.
- Need for Multiple Software Applications -- There are a number of steps to SGML
document creation and editing. The document must be created, tagged, validated and
sometimes converted. Unfortunately for the user, all these steps require a separate
software package, increasing both the software and the training cost.
- Lengthy Installation or Setup -- With the current software offering, it is often the case that
the software must be installed or setup for the specific needs of each organization. This
can require hiring an SGML consultant, further increasing the cost of SGML system
implementation.
Only this year has the market seen some promise of a solution to the SGML problem. Both
Novell and Microsoft have announced SGML authoring systems incorporated into their standard
word processors. To be successful, such systems should accomplish the following:
- Make SGML authoring easier so that the user need not be an SGML expert.
- Bring the cost of an SGML authoring tool into the range of mainstream word processing
software.
- Eliminate the need to learn new esoteric SGML authoring systems to create SGML
documents, reducing training costs significantly.
- Deliver a complete solution, alleviating the need to purchase multiple pieces.
- Deliver a shrink-wrap solution that requires no costly installation or setup.
WordPerfects Involvement in the SGML Environment
For over a decade, WordPerfect has been the world-wide standard for word processing. With
over 17-million users, at least one copy of WordPerfect is present in every organization. As
technology changes, WordPerfect has supported such changes to give users the most advanced
technology in document processing. The recent release of WordPerfect 6.1 for Windows is a
testament to this philosophy. The following are excerpts from the press:
In our view, WordPerfect 6.1 is the strongest of the three [Windows word
processors]. WordPerfect has always been laden with features, but this
latest version makes significant progress in making these features simple to
use . . . Word for Windows lacks some of the sophisticated document filing
capabilities of WordPerfect, which could make finding documents more of
an ordeal on a complex system.
Business Consumer Guide, December 1994
WordPerfect 6.1 is a coup detat over Microsofts Word and Lotuss Ami
Pro. . . . Youll find many improvements in WordPerfect 6.1 that you
didnt think were possible . . . Usability is just about as good as it gets . . .
An unparalleled combination of power and ease of use.
Five-star rating in the November issue of PC/Computing
PerfectSense is the first breakthrough in editing that Ive seen in a long
time, said Jeffrey Tarter, editor of Soft*letter in Watertown, Mass.
WordPerfects a good two years ahead of the competition.
Quoted in PC Week, August, 15, 1994
Over three years ago, WordPerfect Corporation saw that electronic document delivery, which
requires non-proprietary file formats, was the wave of the future and formed an electronic
publishing team. This team comprises electronic publishing tools experts who have studied
market research, reviewed customer feedback, and developed the strategy and tools for bringing
electronic publishing and document interchange to the mainstream. The electronic publishing
tools group has been involved in various consortiums that explore electronic document storage
and delivery such as SGML Open and the ODA Consortium. WordPerfect Corporation has
actively supported non-proprietary file formats (e.g., SGML and ODA) and has released electronic
publishing tools (e.g., Intellitag, Envoy, ConvertPerfect/ODA) that make electronic document
delivery a reality.
SGML Edition of WordPerfect
The SGML Edition of WordPerfect 6.1 for Windows incorporates a complete SGML solution
into WordPerfect 6.1 for Windows. This solution provides a complete SGML authoring tool that
requires little or no training, but delivers a competitively priced shrink-wrapped application. With
SGML Edition of WordPerfect, users will need to learn no esoteric SGML authoring system, but
can use the word processing environment they know. Users will also not need to employ SGML
experts to install and setup their SGML systems, reducing the cost of SGML document creation
and maintenance.
The SGML Edition also provides a complete SGML solution. Users no longer need to purchase
separate pieces such as auto-tagging, validation, authoring and conversion software. The SGML
Edition provides all the necessary pieces in one product, from DTD creation to validation.
Specific features of WordPerfect 6.1 SGML Edition are explained below.
WordPerfect 6.1
The most important thing to remember about the SGML Edition is that it is WordPerfect. All of
the functionality contained in WordPerfect 6.1 is contained in the SGML Edition including its
newest and most exciting features such as PerfectSense, QuickTasks and Coaches.
Using many WordPerfect functions, the user can make the SGML authoring/editing process much
easier. For example, the user can build his/her own QuickTasks and Coaches to assist with
creating and modifying documents. Using the macro facilities of WordPerfect 6.1, the user can
create pre-tagging rules to automate the tagging process and have WordPerfect do a lot of the
work. The macro feature also enables the user to run these tagging processes in batch mode.
SGML capability has also been incorporated into WordPerfects macro language. This gives the
user the ability to write macros that can understand and use the SGML structure. For example,
tag attributes can be added, changed or deleted through the macro language.
Layout Designer
Because an SGML document is only content and structure, any number of formats can be applied
to an SGML document. The Layout Designer is an intuitive part of SGML Edition that enables
the user to create custom layouts that can be applied to any Document Type Definition (DTD).
For example, formatting commands such as center or bold can be assigned to a <title> tag, or
indent to a <paragraph> tag. Such formats can be applied to every element of a DTD.
As stated above, SGML functionality has been incorporated into the WordPerfect macro
language. In the context of the Layout Designer, this means that specific macros can be run upon
insertion of a tag. For example, each time a heading tag is inserted, a macro can be run that
would assign it a certain level heading for table of contents generation.
Because SGML is often more complex than these examples, other features of the Layout
Designer include:
- Context-Sensitive Layout enables users to assign format to a tag within a certain
context. For example, a title tag followed by a subtitle tag could be assigned different
formatting than a title tag followed by an author tag.
- Context-Sensitive Macros enable the user to run a macro upon insertion of a tag within a
certain context. For example, a <title> tag that is encompassed in <title page> tag can run
a different macro than a <title> tag that is not encompassed in a <title page> tag.
- Attribute Sensitive Layout enables the user to assign different formats to different
attributes of the same tag. It also means that different macros can be run on insertion of
different attributes of the same tag. For example a <part number> tag could have an
attribute value of 6C729, and another <part number> tag could have an attribute value
7U891. The user can assign a different format to each of these tags.
- Generated Text from Attributes means that an attribute value can be used as text in the
final document. For example, a <part number> tag with attribute 6C729, could insert
the text 6C729 into the document.
Document Type Definition (DTD) Support
No two organizations or documents are exactly the same. Thats why SGML Edition provides
the option to create user-defined DTDs specific to the needs of each organization or to use one of
the many existing pre-defined DTDs. SGML Edition supports numerous pre-defined DTDs. The
following is a list of pre-defined DTDs that exist today and can be used with or without
modification. These DTDs also ship with the SGML Edition.
- CALS - Used by the defense industry for maintenance documents.
- ATA-100 - Used by the Aerospace industry for maintenance documents.
- J2008 - Used by the Automotive industry for service documents.
- COSE - Used by the Computer hardware and software and electronics industries for
online help.
- Davenport - Used by the Computer hardware and software and electronics industries for
user documentation.
- ISO, DIN - Used by European Aerospace industries for AECMA documentation.
- Edgar - Used by the financial community for SEC filings.
- CANDA - Used by the pharmaceutical industry for new drug applications.
- TCIF - Used by the telecommunications industry for repair documents.
- AAP - Used by the publishing industry for standard documents.
- ISO 9000 - Used by manufacturing for quality organizations.
- Pinnacles - Used by the semiconductor industry for component catalogs.
- HTML - Used on the World Wide Web for Web documents.
Interactive Validation and Error Reporting
To help the user in authoring an SGML document, the SGML Edition coaches the user through
the DTD. As the user completes a tag, the Edition will suggest the next tag based on the DTD.
In cases where any number of tags could come next, the Edition will make a guess based on what
has already been done. This process is called logic chaining, and makes the process easier for a
user who does not have a detailed understanding of the DTD.
If the user is tagging an existing document for SGML output, an interactive validation feature
walks the user through the tagging process. As determined by the DTD, the validator will
indicate what tag should be applied at any point in the document.
At any point of document creation or tagging, the user can ask for an error report. The Edition
will check the document against the DTD, and will list the errors in a pop-up window. An error
occurs when a tag is missing or is improperly placed in the document.
Alias Support
A DTD often contains tags that are not identifiable by name to the user. This is particularly true
of HTML which assigns its tags very cryptic names. The SGML Edition enables the user to
assign user-friendly names to the tags that are included in the DTD. With the Alias Support
feature, the user doesnt need to remember that <H1> is the tag for first level headings because it
can be assigned a descriptive name such as Heading1. This feature further helps the user who did
not write the DTD or who does not have a detailed understanding of it.
Enhanced Extended Character Handling
If a DTD calls for extended characters to be displayed in SGML format, SGML Edition will
translate the string in the DTD to the single character while working on the document in
WordPerfect 6.1. When the document is saved in SGML format, the extended character string is
preserved in the SGML file.
Table Tagging
The SGML Edition supports automatic table tagging. The standard table formats supported
include CALS, AAP, WP5.x and WP 6.x.
Conclusion
The SGML Edition is just the first step in Novells commitment to deliver high-level tools for
electronic document storage and exchange to the masses. Because of the complexity of SGML
and the tools available, it is one of the areas that most computer users have been locked out of.
The SGML Edition finally makes SGML easy and affordable enough for even the average
computer user.
Novells commitment to electronic publishing is part of its overall goal of Pervasive Computing;
...connecting people with other people and the information they need, giving them the power to
act on that information -- anytime, anyplace.
Appendix A
Who Uses SGML?
SGML is popular among document-intensive organizations. Governments, publishing houses,
pharmaceutical companies, and large manufacturers all use SGML regularly. Other groups
including law professionals, libraries, and universities are now looking for a document standard to
facilitate electronic publishing. Industry analysts predict that over the next several years, most
major organizations will switch to a non-proprietary standard for information management.
Specifically, some of the organizations that are implementing SGML today include:
The United States Department of Defense (DoD)
The United States Department of Energy (DoE)
The American Association of Publishers (AAP)
The Telecommunications Industry Forum (TCIF)
The Air Transportation Association (ATA)
The United States Patent Office
The Environmental Protection Agency (EPA)
The Utah Court System
Key Industry Initiatives
Several industry initiatives are also in the works for evaluating SGML as a solution for technical
documentation problems. Some of these are listed below.
The CALS Initiative
CALS (Continuous Acquisition and Life cycle Support) is a U.S. Department of Defense initiative
to acquire and manage technical information. CALS defines standards for storage and transfer of
documents in digital format. One of the key CALS objectives is that documents must be
independent of special word processing systems. To accomplish this, CALS specifies the
international SGML for text documents. Other requirements of the CALS initiative include
specific, non-proprietary formats for graphics, video, and audio files. By requiring electronic
information in pre-defined digital formats, the Department of Defense now saves millions of
dollars and hundreds of hours normally reserved for document conversion and formatting.
The Text Encoding Initiative (TEI)
TEI is using SGML electronic delivery to improve the ability of humanities and linguistics
researchers to perform searching and annotation of relevant texts.
The World-Wide Web (WWW)
The WWW is standardizing on HTML to assist in accessing and navigating through information
on the Internet.
SuperJournal/Super Janet
The SuperJournal project of the British library resulted in a collaboration between nine academic
journal publishers to evaluate a network for delivering SGML-encoded electronic journal articles.
Reg Sage Project
The University of California at San Francisco Reg Sage project is a collaboration with Springer
Verlan and AT&T to deliver medical and scientific journals online to the university library. The
project began in January 1994 and runs through the end of 1997.
The Pinnacles Group
The Pinnacles Group is a consortium of electronic component manufacturers -- Hitachi, Intel
National Semiconductor, Philips, and Texas Instruments -- aimed at delivering product data
sheets online. The Pinnacles Group contends that current printed and bound volumes of data
sheets are difficult and time-consuming for engineers to search and that the engineer must retype
component parameters into the ECAD system.