WordPerfect(R) 6.1 for Windows(TM) SGML Edition
White Paper
Introduction
One of the biggest challenges facing organizations today is information management.
Advancements in technology have produced an abundance of information. But unless this
information can be properly manager, it is of little use. An organizations success depends on its
ability to manage this information. To partially solve this problem, most companies depend on
information management systems (databases) for their financial, customer, and personnel
information. However, much of today's information is document based, rather than data based.
Information management in a document based-environment is a painstaking procedure; because
it has not yet been automated, it can be viewed as the next frontier of information management.
Traditional problems with document-based information management include:
- Disparate File Formats -- seldom, if ever, are all the pieces of information available in
compatible file formats which makes information difficult to assemble.
- Disparate Formatting -- even if separate pieces of information are in the same file format,
the format of the respective pieces and the final document must be changed for
reproduction and distribution. Formatting and re-formatting documents requires
significant time and effort. The U.S. government alone estimates that it spends $5 billion
on document conversion per year.
- No Link from Document to Information -- a document represents a dead-end in the flow
of information because there is no link to the information that created it. This means that
the information must be re-assembled and re-formatted numerous times, as new and
updated documents are created.
- Because of the problems listed above, industries are developing and implementing
standards for storing and exchanging technical information. Many of these efforts look to
SGML to solve these problems.
Benefits of SGML
SGML solves the three problems listed above. SGML is based on ASCII, which is the lowest
common denominator for computer text exchange. Virtually all applications can read ASCII
files, meaning that file format is no longer a problem to document creation. Pieces of
information from different sources can easily be assembled because they are in the same format.
Furthermore, because virtually all systems can read ASCII, SGML is system independent. To
move an SGML file from machine to machine does not require specific hardware, software, or
operating systems. Files can be easily moved, for example, from DOS to Unix, or Windows to
Mac.
Its founders understood that document format would always present a problem and designed
SGML to remove the format from the content and structure of a document. Because SGML
preserves document structure, the layout and format can be automated. This means that pieces of
information from different sources can be assembled, after which format and layout will be
added automatically.
As stated in the previous paragraph, SGML maintains the structure of the document.
Specifically, the elements of a document such as titles, headings, and paragraphs, are tagged
within the document. Any element of a document is then electronically visible and can be
treated as a single datapoint. Information can be stored in a database, an infobase, an online help
system, or a publishing tool without having to reformat each time or reconfigure the data for
different systems. The document is no longer a dead end; information can now be extracted from
it and stored in a database. Furthermore, if an organization has a coordinated SGML system, it
can make real-time edits to all electronic information on its system. Because of the structure of
SGML, it is possible to link a document to the data that created it.
SGML requires that every piece of data is properly identified during the authoring process.
Chances of the information being lost or deleted during the layout process are greatly reduced.
Documents can quickly be checked to make sure that all the required information is included.
What is SGML?
The Standard Generalized Markup Language (SGML) is the International Organization for
Standardization (ISO) standard for document description or more importantly "structure".
SGML is specifically designed to enable text interchange and is intended primarily for use in the
publishing field, but has other applications.
SGML is a symbolic language that provides a coherent and unambiguous syntax for describing
whatever a user chooses to identify within a document.
The basis of SGML is the hierarchal structure of document content. The structure is controlled
by a document blueprint that contains the names and accepted order of each element or section of
a document. The blueprint is called the Document Type Definition (DTD) and is stored in an
ASCII file.
Like a database structure, SGML groups congruent information into singular fields. Each unique
field or element within an SGML file is identified by SGML tags. An opening tag (example:
<chapter>) is inserted in front of the element, and a closing tag, when needed, (example:
</chapter>) is inserted at the end of the element, electronically storing the name of that
element.
SGML tags are in some ways similar to bold codes in WordPerfect. The turn-bold-on code is
placed in front of the text to be bolded, and the turn-bold-off code is placed at the end. The
difference is a WordPerfect code dictates text format while an SGML tag identifies the text.
Unlike WordPerfect tags, however, SGML tags describe only the document elements, rather than
how text is to be formatted. The following example is an example of a SGML memo. (Carriage
returns were added at the end of each line for readability. SGML can be one long stream of
characters.)
<memo>
<address>To: Frank Brown</address>
<sender>From: Area managers </sender>
<date>Date: June 24, 1993</date>
<subject>Re: Salaries</subject>
The use of 'memo,' 'date,' and 'address,' in this example is defined by the creator of this document
type definition (DTD). The syntax used above (<tag_name>,</tag_name>) is also
application
specific. Basically, the creator can vary the characters used to delimit tags and optionally omit
start or end tags. This allows rapid and efficient keyboard entry of markup, where specialized
SGML editors are not available.
How does HTML relate to SGML?
Hypertext Markup Language (HTML), is essentially a DTD, or a subset of SGML. Like any
DTD, the HTML DTD specifies the open and close codes for individual data elements, and the
order in which each element may appear. HTML was established to take advantage of the
benefits of SGML, and to incorporate hypertext links within and across documents. HTML is
the document standard for the World-Wide Web, which is quickly becoming the standard
information storage and exchange mechanism for large organizations such as business and
government.
The SGML Market
According to Interconsult, the leading SGML market research firm, the SGML market, including
all software, hardware, conversion and integration services, is experiencing an annual growth rate
of 34% per year. The market, reaching $668 million in 1994, is forecasted to reach $1.46 billion
in 1998. This explosive growth is due to several key industry initiatives to standardize on SGML
and the emergence of the World-Wide Web, which is standardizing on HTML.
Although the dollar amount of the market seems impressive, over half was spent in integration
and conversion services (54%). These are actually non-automated services performed by SGML
experts. The size of this entire service industry illustrates the immaturity of the SGML market.
As a percentage of total tagging, conversion, and authoring software, the automated part of the
SGML market was fairly small (9.2%, or $56.2 million). Authoring software alone was a mere
9.2% or $48 million. As the market matures, this split will move in the other direction -- more
money will be spent on software and less on services.
With the emergence of the World-Wide Web as a primary means of electronic document
delivery, progress in several key industry initiatives to standardize on SGML in 1993, and the
entrance of key players in the SGML market such as Microsoft and Novell, the adoption of
SGML is on the verge of explosion. A list of current users and key industry initiatives appears in
Appendix A.
Problems w/ SGML Adoption
In spite of the advantages that SGML offers, its acceptance has been slow. There are
basically
two reasons for SGML's slow adoption. The first is the lack of tools that bring SGML to the
masses. Specifically, tools that don't require the user to be an SGML expert. The second, and
equally important, is the significant cost of producing and maintaining SGML documents. As
the above expenditure figures showed, significant resources are involved in the SGML service
industry. Logic would suggest that automation could save much of this cost. Unfortunately, the
cost of SGML systems is still prohibitive. The costs associated with automated SGML solutions
include:
- Software Expense -- The current offering of SGML software is still significantly more
expensive than standard office software.
- Training/Learning Expense -- Training and learning are the hidden costs of adopting new
software. Gartner Group consistently estimates training and learning costs of average
office software up to 10 times greater than the cost of the actual hardware and software.
With the added complexity of SGML, those costs will tend toward the higher end of that
estimate.
- Need for Multiple Software Application -- There are a number of steps to SGML
document creation and editing. The document must be created, tagged, validated and
sometimes converted. Unfortunately for the user, all these steps require a separate
software package, increasing both the software and the training cost.
- Lengthy Installation or Setup -- With the current software offering, it is often the case that
the software must be installed or setup for the specific needs of each organization. This
can require hiring an SGML consultant, further increasing the cost of SGML system
implementation.
Only this year has the market seen some promise of a solution to the SGML problem. Both
Novell and Microsoft have announced SGML authoring systems incorporated into their standard
word processors. To be successful, such systems should accomplish the following:
- Make SGML authoring easier so that the user need not be an SGML expert.
- Bring the cost of an SGML authoring tool into the range of mainstream word processing
software.
- Eliminate the need to learn new esoteric SGML authoring systems to create SGML
documents, reducing training costs significantly.
- Deliver a complete solution, alleviating the need to purchase multiple pieces.
- Deliver a shrink-wrap solution that requires no costly installation or setup.
WordPerfect's Involvement in the SGML Environment
For over a decade, WordPerfect has been the world-wide standard for word processing. With
over 17-million users, at least one copy of WordPerfect is present in every organization. As
technology changes, WordPerfect has supported such changes to give users the most advanced
technology in document processing. The recent release of WordPerfect 6.1 for Windows is a
testament to this philosophy. The following are excerpts from the press:
"In our view, WordPerfect 6.1 is the strongest of the three [Windows word processors].
WordPerfect has always been laden with features, but this latest version makes significant
progress in making these features simple to use. ...WordPerfect for Windows lacks some of the
sophisticated document filing capabilities of WordPerfect, which could make finding documents
more of an ordeal on a complex system."
Business Consumer Guide, December 1994
"WordPerfect 6.1 is a coup d'etat over Microsoft's Word and Lotus's Ami Pro. ...You'll find many
improvements in WordPerfect 6.1 that you didn't think were possible. ...Usability is just about as
good as it gets. ...An unparalleled combination of power and ease of use."
Five-star rating in the November issue of PC/Computing
"PerfectSense is the first breakthrough in editing that I've seen in a long time," said Jeffrey
Tarter, editor of Soft*letter in Watertown, Mass. "WordPerfect's a good two years ahead of the
competition."
Quoted in PC Week, August, 15, 1994
Over three years ago, WordPerfect Corporation saw that electronic document delivery, which
requires non-proprietary file formats, was the wave of the future and formed an electronic
publishing team. This team is comprised of electronic publishing tools experts who have studied
market research, reviewed customer feedback, and developed the strategy and tools for bringing
electronic publishing and document interchange to the mainstream. The electronic publishing
tools group has been involved in various consortiums that explore electronic document storage
and delivery such as SGML Open and the ODA Consortium. WordPerfect Corporation has
actively supported non-proprietary file formats (e.g., SGML and ODA) and has released
electronic publishing tools (e.g., Intellitag, Envoy, ConvertPerfect/ODA) that make electronic
document delivery a reality.
SGML Edition of WordPerfect
The SGML Edition of WordPerfect 6.1 for Windows incorporates a complete SGML solution
into WordPerfect 6.1 for Windows. This solution provides a complete SGML authoring tool that
requires little or no training, but delivers a competitively priced shrink-wrapped application.
With SGML Edition of WordPerfect, users will need to learn no esoteric SGML authoring
system, but can use the word processing environment they know. Users will also not need to
employ SGML experts to install and setup their SGML systems, reducing the cost of SGML
document creation and maintenance.
The SGML Edition also provides a complete SGML solution. Users no longer need to purchase
separate pieces such as auto-tagging, validation, authoring and conversion software. The SGML
Edition provides all the necessary pieces in one product, from DTD creation to validation.
Specific features of WordPerfect 6.1 SGML Edition are explained below.
WordPerfect 6.1
The most important thing to remember about the SGML Edition is that it is WordPerfect. All of
the functionality contained in WordPerfect 6.1 is contained in the SGML Edition including its
newest and most exciting features such as PerfectSense, QuickTasks and Coaches.
Using many WordPerfect functions, the user can make the SGML authoring/editing process
much easier. For example, the user can build his/her own QuickTasks and Coaches to assist with
creating and modifying documents. Using the macro facilities of WordPerfect 6.1, the user can
create "pre-tagging" rules to automate the tagging process and have WordPerfect do a lot of the
work. The macro feature also enables the user to run these tagging processes in batch mode.
SGML capability has also been incorporated into WordPerfect's macro language. This gives the
user the ability to write macros that can understand and use the SGML structure. For example,
tag attributes can be added, changed or deleted through the macro language.
Layout Designer
Because an SGML document is only content and structure, any number of formats can be applied
to an SGML document. The Layout Designer is an intuitive part of the SGML Edition that
enables the user to create custom layouts that can be applied to any Document Type Definition
(DTD). For example, formatting commands such as center or bold can be
assigned to a <title>
tag, or indent to a <paragraph> tag. Such formats can be applied to every element
of a DTD.
As stated above, SGML functionality has been incorporated into the WordPerfect macro
language. In the context of the Layout Designer, this means that specific macros can be run upon
insertion of a tag. For example, each time a heading tag is inserted, a macro can be run that
would assign it a certain level heading for table of contents generation.
Because SGML is often more complex than these examples, other features of the Layout
Designer include:
- Context-Sensitive Layout enables users to assign format to a tag within a certain context.
For example, a title tag followed by a subtitle tag could be assigned different formatting
than a title tag followed by an author tag.
- Context-Sensitive Macros enable the user to run a macro upon insertion of a tag within a
certain context. For example, a <title> tag that is encompassed in <title page> tag
can
run a different macro than a <title> tag that is not encompassed in a <title page>
tag.
- Attribute Sensitive Layout enables the user to assign different formats to different
attributes of the same tag. It also means that different macros can be run on insertion of
different attributes of the same tag. For example a <part number> tag could have an
attribute value of '6C729,' and another <part number> tag could have an attribute value
'7U891.' The user can assign a different format to each of these tags.
- Generated Text from Attributes means that an attribute value can be used as text in the
final document. For example, a <part number> tag with attribute '6C729,' could insert the
text 6C729 into the document.
Document Type Definition (DTD) Support
No two organizations or documents are exactly the same. That's why SGML Edition provides
the option to create user-defined DTDs specific to the needs of each organization or to use one of
the many existing pre-defined DTDs. SGML Edition supports numerous pre-defined DTDs.
The following is a list of pre-defined DTDs that exist today and can be used with or without
modification. These DTDs also ship with the SGML Edition.
Industry Application Initiative
Defense Maintenance contract IETM CALS
Aerospace Maintenance ATA-100
Automotive Service J2008
Computer & Hardware Online help COSE
& Software User documentation Davenport
European Aerospace AECMA documentation ISO, DIN
Financial SEC filings Edgar
Pharmaceutical New drug applications CANDA
Telecommunications Repair TCIF
Electronics User documentation COSE
Davenport
Publishing Documents AAP
Manufacturing Quality organizations ISO 9000
Semiconductor Component catalogs Pinnacles
Internet World-Wide Web HTML
Interactive Validation and Error Reporting
To help the user in authoring an SGML document, the SGML Edition coaches the user through
the DTD. As the user completes a tag, the Edition will suggest the next tag based on the DTD.
In cases where any number of tags could come next, the Edition will make a guess based on what
has already been done. This process is called logic chaining, and makes the process easier for a
user who does not have a detailed understanding of the DTD.
If the user is tagging an existing document for SGML output, an interactive validation feature
walks the user through the tagging process. As determined by the DTD, the validator will
indicate what tag should be applied at any point in the document.
At any point of document creation or tagging, the user can ask for an error report. The Edition
will check the document against the DTD, and will list the errors in a pop-up window. An error
occurs when a tag is missing or is improperly placed in the document.
Alias Support
A DTD often contains tags that are not identifiable by name to the user. This is particularly true
of HTML which assigns its tags very cryptic names. The SGML Edition enables the user to
assign user-friendly names to the tags that are included in the DTD. With the Alias Support
feature, the user doesn't need to remember that <H1> is the tag for first level headings
because it
can be assigned a descriptive name such as Heading1. This feature further helps the user who
did not write the DTD or who does not have a detailed understanding of it.
Enhanced Extended Character Handling
If a DTD calls for extended characters to be displayed in SGML format, SGML Edition will
translate the string in the DTD to the single character while working on the document in
WordPerfect 6.1. When the document is saved in SGML format, the extended character string is
preserved in the SGML file.
Table Tagging
The SGML Edition supports automatic table tagging. The standard table formats supported
include
CALS, AAP, WP5.x and WP 6.x.
Conclusion
The SGML Edition is just the first step in Novell's commitment to deliver high-level tools for
electronic document storage and exchange to the masses. Because of the complexity of SGML
and the tools available, it is one of the areas that most computer users have been locked out of.
The SGML Edition finally makes SGML easy and affordable enough for even the average
computer user.
Novell's commitment to electronic publishing is part of its overall goal of Pervasive Computing;
...connecting people with other people and the information they need, giving them the power to
act on that information -- anytime, anyplace.
Appendix A
Who Uses SGML?
SGML is popular among document-intensive organizations. Governments, publishing houses,
pharmaceutical companies, and large manufacturers all use SGML regularly. Other groups
including law professionals, libraries, and universities are now looking for a document standard
to facilitate electronic publishing. Industry analysts predict that over the next several years, most
major organizations will switch to a non-proprietary standard for information management.
Specifically, some of the organizations that are implementing SGML today include:
The United States Department of Defense (DoD)
The United States Department of Energy (DoE)
The American Association of Publishers (AAP)
The Telecommunications Industry Forum (TCIF)
The Air Transportation Association (ATA)
The United States Patent Office
The Environmental Protection Agency (EPA)
The Utah Court System
Key Industry Initiatives
Several industry initiatives are also in the works for evaluating SGML as a solution for technical
documentation problems. Some of these are listed below.
The CALS Initiative
CALS (Continuous Acquisition and Life cycle Support) is a U.S. Department of Defense
initiative to acquire and manage technical information. CALS defines standards for storage and
transfer of documents in digital format. One of the key CALS objectives is that documents must
be independent of special word processing systems. To accomplish this, CALS specifies the
international SGML for text documents. Other requirements of the CALS initiative include
specific, non-proprietary formats for graphics, video, and audio files. By requiring electronic
information in pre-defined digital formats, the Department of Defense now saves millions of
dollars and hundreds of hours normally reserved for document conversion and formatting.
The Text Encoding Initiative (TEI)
TEI is using SGML electronic delivery to improve the ability of humanities and linguistics
researchers to perform searching and annotation of relevant texts.
The World Wide Web (WWW)
The WWW is standardizing on HTML to assist in accessing and navigating through information
on the Internet. This initiative will likely never adopt full SGML.
SuperJournal/Super Janet
The SuperJournal project of the British library resulted in a collaboration between nine academic
journal publishers to evaluate a network for delivering SGML-encoded electronic journal
articles.
Reg Sage Project
The University of California at San Francisco Reg Sage project is a collaboration with Springer
Verlan and AT&T to deliver medical and scientific journals online to the university library.
The project began in January 1994 and runs through the end of 1997.
The Pinnacles Group
The Pinnacles Group is a consortium of electronic component manufacturers -- Hitachi, Intel
National Semiconductor, Philips, and Texas Instruments -- aimed at delivering product data
sheets online. The Pinnacles Group contends that current printed and bound volumes of
datasheets are difficult and time-consuming for engineers to search and that the engineer must
re-type component parameters into the ECAD system.