Microsoft® SGML Author for Word

SGML Author Backgrounder

Introducing Microsoft SGML Author for Word

Word version 1.0 for Windows was released in 1989, bringing WYSIWYG, dynamic data exchange, and a robust macro language to users. In the fall of 1991 Word version 2.0 for Windows introduced innovations such as drag and drop, one click bullets and envelopes, and mail merge that stepped the user through the process. Word 6.0 for Windows continues this tradition of innovation by bringing intelligent automation to word processing-intelligently predicting your next step, and then taking it for you. While Word 6.0 makes everyday word processing tasks easier, many companies want to use Word with exciting high-end technologies like SGML.

SGML is an international standard for representing information in documents. SGML provides an open-systems approach for delivering a wide variety of powerful solutions ranging from electronic publishing to advanced document repositories. Microsoft SGML Author for Word provides companies with an easy-to-use, cost-effective way to get their documents on-line quickly.

SGML Author is an add-on to Word 6.0 for Windows, that leverages Word's easy-to-use authoring tools to hide the complexities of SGML from end users-enabling them to create SGML quickly without extensive training or knowledge. And administrators can use SGML Author's innovative point-and-click configuration tools to work with any DTD.

The SGML Author for Word Backgrounder describes SGML and the SGML marketplace as well as the features that make SGML Author a powerful extension to Word 6.0 for creating SGML.

What is SGML?

SGML (Standard Generalized Markup Language) is an international standard that describes the relationship between a document's content and its structure. SGML allows document-based information to be shared and re-used across applications and computer platforms in an open, vendor-neutral format. SGML is sometimes compared to SQL, in that it enables companies to structure information in doc-uments in an open fashion, so that it can be accessed or re-used by any SGML-aware application across multiple platforms.

Unlike other common document file formats that represent both content and presentation, SGML represents a document's content (data) and structure (interrelationships among the data). Removing the presentation from content establishes a neutral format. Now documents as well as the information in documents can easily be re-used by publishing and non-publishing applications.

SGML identifies document elements such as titles, paragraphs, tables, or chapters as distinct objects, allowing users to define the relationships between the objects for structuring data in documents. The relationships between document elements are defined in a DTD (Document Type Descriptor), and is roughly analogous to a collection of field definitions in a database. Once a document is converted into SGML and the information has been 'tagged', it becomes a document-database that can be searched, printed, or even programmatically manipulated by SGML-aware applications.

Why are companies using SGML?

According to the Gartner Group, a Connecticut-based research firm, nearly 95 percent of a corporate information is stored in documents. Further studies indicate that companies spend between 6 and 10 percent of their gross revenues printing and distributing documents. Many companies are seeking new ways to leverage their investment in documents so that they can be easily re-used or distributed electronically. However, they are still hindered by hardware and software incompatibilities. SGML provides an open-systems approach for representing information in documents, becoming the leading standard for document interchange.

There are three primary reasons why companies are moving their documents into SGML. These reasons are:

How is SGML being used?

One of the most important advantages to SGML is that it provides companies with a flexible, vendor-neutral, data format that can be easily used as input for multiple delivery formats. In addition, since a centralized document store can be used as input for these value-added processes, management and updating the information is greatly simplified.

Airline Industry

When an airframe manufacturer delivers a plane to a customer it comes with thousands of pages of documentation. Distributing this information on paper is expensive so companies are investigating publishing on CD-ROM. The CD-ROM contains all the documentation which is viewed with an intelligent tool that queries the information stored on the CD-ROM.

For example, if a maintenance person needs a guide for adjusting a plane's flight surfaces, the viewing tool automatically assembles the information from the repository as a complete document. But this is really the start because SGML can do much more. SGML can be used to define attributes to information stored in documents such as security levels. In the previous example, the maintenance person may only see information about the control surfaces and how to adjust them. While their supervisor, presumably with a higher security level, might have access to additional information about part numbers, vendor names, and cost.

What industries are investing in SGML?

In general, any regulated industry is likely to be investigating SGML as governments look for ways to reduce their reliance on paper documentation and increase information reusability. There is no distinct company size that is interested in SGML, as many companies are looking at SGML as strategic tool for lowering operational costs, increasing quality and customer satisfaction. However, the most commonly referenced industries are comprised of 'Fortune 1000' level companies in North America as well as Europe and Asia.

Examples of industries that are currently investing in or exploring the benefits of SGML are:

Industry           Industry Standard DTD             


Military/Defense   CALS (Continuous Acquisition and Lifecylce Support)   

Government         PTO has their own DTD.  The SEC is working with 
Agencies:     Mead Data Central on a project called EDGAR 
(PTO & SEC)      

Airlines/Aerospace ATA 100 (Airline Transport Association)       

Publishing         AAP (American Association of Publishers)       

Pharmaceuticals    The FDA is currently developing a standard DTD (CANDA)   

Automotive         SAE J2008                         

Semiconductors     Pinnacle                          

Telecommunications No standard yet                   

Financial/Insurance No standard yet                   

What is the current state of the SGML market?

The SGML industry is comprised of many vendors with very few clear industry leaders. Despite this fact, the SGML industry is very substantial in terms of dollars. In 1993, the SGML industry was estimated to be US$520 million and is projected to grow to over US$1.46 billion by 1998.

SGML Tools

There is a wide variety tools that can be used to create SGML solutions. The SGML industry can be separated into the following six categories:

1. Mainstream Authoring-consists of the key word processing vendors like Lotus, WordPerfect and Microsoft. These products are not usually considered SGML tools and therefore are not included in the overall SGML market sales figures.

2. SGML Editing & Publishing-includes traditional SGML authoring tools like ArborText, Interleaf, FrameBuilder and SoftQuad Author/Editor.

3. SGML Conversion-is one of the largest sectors in the market today because many companies are converting legacy data from mainframes, or documents create with mainstream word processors, into SGML. The major vendors here are Avalanche, and Exoterica. They provide tools that can be configured to parse documents and automatically add SGML tags. A growing percentage of conversion work is being done overseas by companies with low-cost labor to tag the documents by hand.

4. Electronic Delivery-is widely regarded as the most compelling reason companies are moving to SGML. Electronic delivery enables users to retrieve information online using an intelligent document viewer. Major players in this sector are Electronic Book Technologies (EBT), Interleaf, and SoftQuad.

5. Document Management-this is probably a few years out yet as a technology that is going to drive a major part of the overall SGML industry. Documentum and Interleaf are leading document management vendors.

6. SGML Document Repositories-is one of the cornerstone technologies that effects how SGML moves forward as data standard. Document repositories are object oriented databases that are designed specifically for storing and managing information that is usually stored in documents. Currently, this technology is only available on UNIX, however, companies like EBT and Interleaf are currently porting their solutions to the Windows NT operating system.

According to InterConsult, a Massachusetts-based consulting firm, most companies cite electronic delivery as the most compelling reason for moving to SGML now that they can clearly see the advantages of paperless information delivery mediums. The recent explosion of interest in the World Wide Web and distributing documents based on HTML (a very simple application of SGML) has also contributed greatly to the demand for SGML. In addition to electronic delivery, document archival and version control are also important reasons for companies moving to SGML.

The SGML 'authoring' market

There is a continuum of tools in the SGML authoring market for creating SGML (Figure 1). These tools range from very basic SGML editing and conversions tools to advanced SGML publishing systems as described below:

1. Limited Editing & Conversion applications like WordPerfect IntelliTag provide a separate authoring environment with limited editing tools for modifying native SGML and require the end user to understand the SGML standard before the document can be converted.

2. Mainstream Authoring tools like Microsoft SGML Author for Word enable end users to work in their current mainstream word processor and create SGML without extensive training or knowledge.

3. SGML Editing tools like SoftQuad Author/Editor authoring environments for users that are familiar with SGML and want to work with native SGML.

4. SGML Publishing tools such as ArborText and Interleaf are used by SGML experts for collecting large amounts of data and assembling it into very large documents.

Figure 1: SGML Authoring Tool Categories [Graphic Not Available]

In 1992, the worldwide market for SGML authoring software was estimated to be US$38 million. It is projected to increase to US$119 million by 1998 (Figure 2).

Figure 2: SGML Authoring Sales Forecasts [Graphic Not Available]

SGML Platform Support

Traditionally, the SGML marketplace has been dominated by UNIX and very few PC-based applications. However, there is an increasing trend away from these platforms to Windows for document authoring and electronic delivery, while UNIX is being used more for document database and document management servers. In addition, InterConsult found a high level of interest in Windows NT as a server platform for database and document management applications.

Component-Based vs. Single-Vendor Solution Model

The broad nature of the SGML market makes it entirely possible for companies to choose 'best of breed' applications as components of their SGML solution. This has spawned a vigorous industry of value-added resellers, integrators, consultants and Independent Software Vendors (ISVs) that sell integration services to build SGML solutions. Integration services were estimated at over US$150 million in 1993 (or nearly 30% of the overall industry sales) and will continue to grow until 1996 when sales plateau at US$211 million.

Microsoft SGML Author for Word

SGML Author is an add-on to Word 6.0 for Windows, that builds on Word's easy-to-use word processing tools to hide the complexities of SGML from end users-enabling them to become productive quickly without extensive training or knowledge. With SGML Author, users can forget about spending hours manually tagging documents because SGML Author does it for them automatically. All they have to do is focus entirely on creating documents.

Easy to use SGML authoring solution

According to industry research on companies that are currently using SGML, the most common enhancement request is to make SGML easier. SGML Author provides ease-to-use tools for end users and administrators.

End Users

  1. SGML Author does not use SGML tags in the authoring environment, thereby shielding end users from the complexities of SGML and allowing them to focus on creating content.
  2. As an add-on to Microsoft Word 6.0, SGML Author users have all of Word's ease-of-use tools, enabling them to quickly and easily generate SGML without extensive training or knowledge of SGML.
  3. SGML Author's unique validation engine provides you with a user-friendly feedback loop using Word's built-in annotation feature for identifying and correcting DTD compliance errors.


  1. Administrators can configure SGML Author without programming-SGML Author's point & click tools enable administrators to easily associate their standard Word templates and any DTD, reducing initial SGML implementation costs.
  2. SGML Author is designed to work with any valid DTD, making it easy to create solutions based on a wide variety of DTDs.
  3. SGML Author contains sample templates and association files for the CALS (Consistent Acquisition and Lifecycle Support, MIL-M-28001b), and Airline Transport Association (ATA 100) DTDs as well as easy-to-use starter examples consisting of a simple sample DTD, association file and template for administrators who are new to SGML.
  4. SGML Author also includes Internet Assistant-an add-on tool for Word 6.0 for Windows that enables users to create and browse HTML documents on the World Wide Web entirely within Word.

Leveraging your investment in Word

As SGML becomes increasingly accepted as one of the tenets of Open Systems for reusing and distributing document-based information electronically, more customers are looking for mainstream tools to provide a cost-effective method for creating valid SGML. Mainstream tools, like Word, are particularly compelling because they enable companies to leverage their investments in software, training, and legacy documents. Many Microsoft Word customers echo these needs, which is one of the most important reasons why SGML Author for Word was developed.

  1. Unlike existing SGML editing tools, SGML Author allows uses to create SGML in their mainstream word processor rather than requiring a completely separate editing environment.
  2. Since SGML Author is based on Word and end users do not need extensive training on SGML to be productive, incremental training costs and end-user downtime are minimized.
  3. SGML Author users can take advantage of all Word's ease-of-use and intelligent end-user automation features, while enjoying the benefits of working in a familiar environment as they create SGML.

Support for key industry standards

SGML Author's ease-of-use and author productivity features are matched with its built-in support for key industry standards and its ability to read and write syntactically correct, fully parseable SGML. Examples of converters included with SGML Author are:

  1. SGML Author converts to and from the CALS table model-the leading standard for representing tables as SGML.
  2. Word Equation Editor objects can be converted to and from the ISO/IEL TR 9573-11:1992E standard for representing equations with SGML.
  3. In addition to the wide variety converters included with Word 6.0, SGML Author includes new converters for enhanced CGM graphics, GRP IV fax raster standard raster images, as well as import/export converters for OLE objects.
  4. SGML Author provides native support for the ISO Character Entity Sets.

How does SGML Author Work?

The SGML Author user model is divided between administrators and end users (see Figure 3).


Administrators use SGML Author's point and click tools to configure the SGML converter to work with any DTD. This involves mapping styles in a Word template with Elements in a DTD and saving this association as an Association File.

End Users

End users create documents in Word 6.0 for Windows using styles for all formatting based on the Style Guide created by an administrator. When they save the document as SGML the converter creates a fully parseable, syntactically correct SGML instance. If the end user has created a document that is incomplete or incorrectly structured, SGML Author automatically modifies the document to conform to the target DTD.

Figure 3: SGML Author user model [Graphic Not Available]

Feedback Loop

For authors that are responsible for quality assurance, SGML Author can create an annotated Word document that provides them with a friendly feedback loop using Word's built-in Annotation capability. The annotations mark areas where SGML Author automatically modifies the document for the end user and provides them with an easy-to-understand description of why the document does not comply with the target DTD.

SGML Author and Third-Party Vendors

Figure 4: SGML Tools [Graphic Not Available]

As mainstream authoring tool for creating SGML, SGML Author is a single component of a larger SGML solution that may ultimately include applications from many vendors. SGML Author is an easy-to-use tool for creating SGML, to be used with other downstream value-add tools like electronic delivery, document management and document databases (Figure 4). However, even within the authoring category there is a continuum of customer needs that are addressed by third-party vendors.

Third-Party Tools for SGML Author

Microsoft's strategy is to actively work with third-party vendors to offer users complete, end-to-end, SGML solutions. A number of third-party vendors have created products and services that leverage their specific areas of expertise to specifically augment the capabilities of SGML Author.

Avalanche Development

Avalanche Development is a leading SGML vendor for document conversion, including tagging legacy documents as SGML. Avalanche also provides SGML consultation and training services.

To make it easier for companies to convert their Word documents to SGML, Avalanche developed SureSTYLE . SureSTYLE is a companion product for SGML Author that evaluates the visual characteristics of a Word document and consistently applies Word styles. When SureSTYLE completes its operation, the document can be saved as SGML from SGML Author.

In addition to SureSTYLE, Avalanche is offering SGML Author Assistant . SGML Author Assistant includes SGML Author as well as a complete package of integration, training and support services for companies implementing SGML solutions using SGML Author.

SoftQuad Inc.

SoftQuad is a leading SGML vendor offering high-end SGML editing and publishing tools as well as training and integration services. For organizations that choose to do quality assurance based on native SGML, Microsoft and SoftQuad have worked together to develop SoftQuad Enactor for SGML Author.

When an author creates a document with Word, it is possible for them to create incomplete or incorrectly formed documents. In those cases, SGML Author will automatically modify the document to conform to the target DTD. There are two feedback mechanisms for reviewing these modifications. Authors can review their document via SGML Author's report file (see 'How does SGML Author Work?'), or they can review the native SGML with SoftQuad Enactor.

SoftQuad Enactor is targeted at those users who need to inspect native SGML and reflects Microsoft Word's look in a structured, tag oriented, environment. SGML Author outputs specially marked comments in the SGML text stream to insure seamless interoperability with SoftQuad Enactor. Enactor can navigate directly to these comments so the user can inspect the areas in their document modified by SGML Author.

MicroStar Software Ltd.

MicroStar is a leading supplier of easy-to-use tools for creating DTDs. The product Near&Far provides administrators with a powerful graphical tool for designing the structure and content of documents. Near&Far then exports the hierarchical structure as a DTD that can be used with SGML Author. Administrators can then use the point & click tools in SGML Author to associate their standard Word templates with the DTD.

Microsoft is also working with companies like Interleaf and Electronic Book Technologies, among others, to deliver complete SGML solutions to customers using SGML Author.

SGML Author Availability

SGML is a highly strategic technology for many organizations. The long-term ramifications of an SGML solution makes the purchasing process for SGML Author more like an operating system rather than a mainstream application. This is why Microsoft is working with key industry ISVs and integrators to assure that companies that are implementing SGML solutions that use SGML Author have the highest level of third-party support. In addition to integration services, third-party partners will provide training, and support to SGML Author customers.

SGML Author will be available through a number of well established channels including third-party SGML integrators, Solution Providers, and direct from Microsoft. SGML Author will have an estimated retail price (ERP) of US$595.00 and will also be available through the same volume licensing packages as other Microsoft software. SGML Author is expected to be early 1995.

System Requirements:

©1993-1994 Microsoft Corporation. All rights reserved. Printed in the United States of America.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This technical overview is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.

Microsoft, MS, and MS-DOS are registered trademarks and Windows is a trademark of Microsoft Corporation.