[Mirrored from: http://www.passage.com/news/pubs/bob/success.htm]

Passage Systems Inc.
[ Passage Home ] [ About Passage ] [ Online Services ] [ What's New ] [ Products ] [ Customers ] [ Partners ] [ Employment ]  [ Contact Info. ] [ Courses ] [ Consulting ]

Successful Migration to SGML

Robert J. Glushko, Passage Systems

Just about everyone who has heard about SGML accepts the notion that at some future date, having brought SGML into their organization will have been a good thing to do. Storing documents using structured, non-proprietary markup makes it possible to publish the same information in numerous output formats with little manual rework and enables the efficient creation of new publications by reusing and reassembling content. For information with a long useful life, and for applications in which validation, consistency, and reuse are important goals, SGML is the best way to preserve and capitalize on the investment required to create the information in the first place.

The key question, then, is not whether you should adopt SGML for such information, but whether you can successfully adopt it given your current capabilities, methods and technology. My experiences with many companies who have adopted SGML suggests that a small number of factors predicts the success or failure of an SGML project. In this essay I will help you diagnose your organization so that you can either proceed with confidence in adopting SGML or be able to identify the problems you need to fix to increase your chances of a successful migration.

The Six Key Factors for Success

I have worked for nearly twenty years as a consultant, a manager of consultants, and as a co-founder of a company that helps organizations make the transition to SGML and on-line publishing. I have seen many organizations make it to SGML (and a few not make it), and have identified from this experience six factors that are good predictors of success at adopting SGML.

No one of these factors is sufficient, and neither are all six factors strictly necessary, but taken together they define a kind of composite case study or template for a successful migration to SGML.

The factors are as follows:

Customer-Centered Justification

First, you should ask the question "Why is my organization interested in SGML?" Most SGML initiatives have numerous justifications, but some are better predictors of success than others. If "we need to cut printing costs this quarter", "our competition is doing it", or "the elegance of SGML appeals to our technical people" are the most prominent reasons, your SGML project may be getting off on the wrong foot.

On the other hand, your justification may start with a phrase like; "we've surveyed our customers and they want..." or "we can increase sales if we create new products and services by reusing and reassembling our content". These are customer-centered reasons, and they suggest that your SGML initiative is more firmly related to the heart of your business, which gives it a much greater chance of success.

When a customer-centered justification drives an SGML initiative, it is easier to sustain internal support and resources during the transition period. The substantial benefits promised by SGML take some time to achieve, and SGML projects always "get away from the gate" more slowly than efforts to publish information using Acrobat or HTML, which are format-oriented and seductively easy to use. Neither Acrobat nor HTML offers the power and the complete vendor and technology independence of SGML, but neither demands very much to get started.

SGML projects driven solely by cost reduction sometimes get canceled before they start to pay off because the people funding or managing them panic when they don't see immediate benefits. If your organization suffers too much from a quarter-to-quarter mentality, you may be tempted to oversell the short-term benefits or downplay the transition costs of SGML to justify its adoption. This is almost certainly a blueprint for failure.

There is "no free lunch" with SGML. Markup in general, and SGML markup in particular, is an investment in structuring and adding value to information that pays off in more functional applications and lower recurring costs as information is reused or repurposed. Non-SGML applications are less flexible and extensible, but they require less up front work. In the short term, non-SGML publishing can seem more attractive than SGML because the extra work required to get started with SGML skews the apparent cost/benefit ratio in favor of low-investment approaches.

I've heard people say "SGML is a good idea, but not on my shift. Why should I be the one who pays for the long term benefits to the company with a poor bottom line in my organization this quarter? I can begin on-line publishing immediately with Acrobat". They can, but there is no free lunch with Acrobat either. You get only the benefits you pay for.

Customer-centered projects are also more successful because the documents for which you develop your SGML application are likely to be those your customers actually use, rather than the ones that happened to be readily available. Your customers might even be involved in your document analysis and their requirements will make your application more robust and extensible.

Customer-centered projects are also less likely to be sidetracked by the "hand-crafted demo" syndrome. If customer requirements are salient early in a project, scaleability and efficiency in development methods and tools are critical concerns. Otherwise, developers can get captivated by the fun of creating a compelling and entertaining demonstration for themselves, ending up with a trivially-small sample of information using "hand-crafting" methods and tools that won't scale up for the complete application.

Standard Authoring Tools and Styles

The extent of standardization in your organization's use of word processing or authoring software is a key predictor of success in adopting SGML. The best situation is if your organization has standardized on a single word processing program, supported by standard templates and styles for the types of publications your organization creates. From there it should be possible to write a highly effective conversion program to SGML, either as part of a one-time migration to native SGML authoring or to support an ongoing "author to convert" strategy. Then you have a content-oriented information repository whose data model accurately and completely reflects the information-encoding requirements of your organization.

However, if your organization does not currently follow structural and stylistic standards for authoring, you may be daunted by the prospect of putting them in place. You might opt instead for the appealing simplicity of HTML, whose limited repertoire of elements for headings, lists, and links sometimes allows unstructured and unstyled word processing files to be converted to it with only a little cleanup "by hand". But what are the cumulative costs of not being able to automate document conversion if each author does "just a little" cleanup and linking for every document each time it is revised? What are the long-term costs of having a format-oriented collection of information instead of a validated, reusable document database? Investing in authoring standards in your on-line publishing initiative is a sure bet.

There are other less obvious benefits of standards for authoring. The simple fact that your organization recognizes the value of standards means that there will be easier acceptance of SGML. Common tool sets also get people into the habit of learning from one another. It creates an environment where people cooperate in support of organizational goals, which has the effect of minimizing the impact bringing in SGML methods and technology.

Explicit Publications Process

A standard authoring environment is the most important part of an explicit publications process, which is in its own right a good predictor of successful migration to SGML.

Bringing SGML into an organization requires an end-to-end perspective on the publications process. The desired result is an integrated system of processes and technology that begins with the SGML or non-SGML authoring activities, followed (possibly) by format conversion, validation, information management, (possible) down-translation to non SGML formats, and that ends with the distribution and presentation of information to the end users. The SGML document type definition is the linchpin of this system, manifesting itself not just in the SGML information repository but indirectly in "structure-aware" word processing templates, presentation specifications or style sheets in browsers, and at various other points in the system.

Few organizations start from a completely explicit publications process, but those that have achieved ISO 9000 certification either have one or are predisposed to define one. Other indications of explicit process are a "Getting Started" handbook for new authors, published style guides, and standards of various types. If your organization has these things, you should then ask yourself it they are actually used in day-to-day work.

A challenging part of migrating to SGML is designing the end-to-end publishing system in a way that builds on the robust characteristics of the existing publishing process and the skills and capabilities of the people who carry it out. It is obviously easier to exploit strengths and remedy weak points in the current process when you know exactly where to look for them.

The most effective solutions assign clear roles and responsibilities to authors, editors, production personnel, and others in the organization, allocating the costs and benefits of SGML among them fairly. When the publications process is already explicit, everyone can see their impact on everyone else, and they more readily accept the discipline that SGML imposes.

For example, when authors see that their content must be converted to SGML by a production group for multiple delivery options, they more quickly become "structure-aware". This can substantially reduce the workload of the production group by making it possible to create high-quality SGML from word processing files in an automated way. One technical publications organization that my company helped adopt SGML has reduced its production time from six weeks to two days using structure-aware authoring and integrating software that allows authors to preview the conversion and viewing processes. By the time authors turn over their word processing source files to the production group, they are 100% confident that their content will convert with 100% accuracy.

In contrast, if authors just "throw it over the wall" to a production organization, with validation and subsequent processing invisible to them, they will have little motivation to become more attentive to the structure of the documents they are creating. This imposes an unnecessary manual clean-up task on the production group and potentially jeopardizes the overall success of the SGML initiative.

Content-Centered Process

Organizations whose existing publishing processes are oriented toward the effective creation, management, and reuse of content are usually successful at adopting SGML. Good indications of this orientation are a centralized repository of source files or "boilerplate" under administrative control and a policy that encourages the reuse of information.

Organizations with effective source management generally have less risk if they decide to convert legacy documents to SGML. When a wide range of representative source files is readily available, efficient conversion programs can usually be developed.

In contrast, organizations that typically produce "one of a kind" publications, especially those with high production value, often find it more challenging to adopt SGML. If your organization follows this "artifact-centered" or "camera-ready copy" process, your authors might be overly concerned with getting documents to look right. Ad hoc formatting and misapplication of style tags to achieve a particular appearance goal prevents efficient conversion to SGML and makes reuse of source information unlikely.

An organization that is intensely oriented toward the appearance of print publications may have standardized on a desktop publishing program like Quark or Pagemaker. While these programs are well-suited for page layout, if they are also used for final editing the DTP files become the authoritative source information. This poses a problem if new requirements suggest an SGML initiative, since the file formats of these programs are notoriously hard to convert. Quark has been referred to as the "Roach Motel" of file formats: "you can check in, but you can never check out".

Low Reliance on Contract/outside Authors

Some companies use contractors, consultants, or other temporary employees in technical publications, methods and procedures, or marketing organizations, all of which are common loci of SGML efforts. If your company makes extensive reliance on contractor outside authors, it can be a warning sign for your SGML initiative, because it makes it less likely that your company will make sufficient investments in new tools and training when adopting SGML. After all, your organization is probably using contract employees as authors to reduce costs.

In contrast, when all authors are employees, a longer-term investment case is easier to justify. It is also easier to provide incentives (both positive and negative) that encourage authors to focus on structure awareness and compliance to new standards when they work for the company taking on SGML.

In the worst case with respect to SGML migration, your organization may have contracted with an outside author to produce a complete publication in final form. In this situation the source files provided by the author (if they are delivered at all) might be useless for conversion to SGML, since they are unlikely to use styles in a disciplined way.

Traditional publishers almost always rely on outside authors, but this is less a barrier to successful adoption of SGML than it is for SGML projects for non-publishers. Long after the use of word processors by authors was commonplace, the editorial and production processes of publishers assumed that the source files provided by authors were of little use; a manuscript would be edited on paper and then rekeyed by a typesetter or service bureau.

However, now that SGML editorial and production systems for publishers are emerging, some publishers see the cost and productivity savings if authors provide usable source files. One approach is to provide "structure-aware" word processing templates to authors to enable efficient conversion. Another is to provide authors with an SGML editor set up to use the publisher's DTD.

Mechanisms for Systematic Employee Development and Technology Adoption

A final predictor of your organization's likely success at migrating to SGML is the degree to which it is systematic at developing employees and adopting technology. This issue is related to some of the other predictors in this checklist, especially the explicitness of process and the extent of technology standardization, but is worth looking at separately. It is a bit "fuzzier" than the other predictors, falling more into the realm of company "culture".

The largest costs in adopting SGML aren't usually the costs of new computers or software. Instead, the cost is more likely to be felt as "people costs" as people learn to think about authoring, information management, and delivery of information in different ways.

One of the basic benefits of an SGML initiative is to free your organization from proprietary software technology, letting you selectively update parts of your end-to-end information system as the "best of breed" technology changes or as new applications are developed. But taking advantage of this perspective requires that the people who work within the new SGML environment understand the big picture. Some SGML initiatives fail because new technology and methods are introduced without adequate training: Does your organization provide formal training with new software, or are people expected to learn on their own?

It is also possible to fail because the organization has a bias to staying on the "bleeding edge" and brings in new tools and upgrades to existing ones before the current generation has been routinized. Organizations that have introduced quality programs, or programs for continuous process improvement, are usually successful at adopting SGML because they strive for a balance between bringing in new methods and tools and working well with the existing ones.


If your publishing or information management requirements suggest an SGML approach, there is an inevitable cost of making the transition. The transition costs are inevitable because adopting SGML involves a migration from format-oriented markup to the more demanding criteria of explicit structural and content markup. The "camera-ready copy" mentality that focuses on the appearance of a finished publication must be supplanted by an emphasis on structural encoding, either by using an SGML editor or by using a traditional word processor in a "structure aware" manner to enable automated translation into SGML.

So how do you ensure that your migration to SGML is successful? The simplest answer is not to start until you're ready. You need to make an honest assessment of your current capabilities, methods and technology. If your current publications process isn't explicitly documented, try to document it. If you have a corporate style guide but no one follows it, find out why they don't. Do you know where the source files for your publications are kept? Do you know whether they are up to date?

You can't put off asking these questions. If you put off fixing the root problems, all that does is delay the pain of transition. You spend more time learning bad habits, creating pretty documents with non-standard layouts and styles, and you put off the inevitable reckoning with the demands for consistency and structure that SGML depends upon to provide its benefits. On the other hand, if you determine that your organization has a mature and stable publications process in place, you can proceed to adopt SGML with confidence.

The second answer to the question of how you ensure a successful migration to SGML is to only do it if it makes sense for strategic, customer-grounded reasons. If your information has a short useful life and you have little control over its authoring process, the cost of getting to SGML may not be worth the effort. But if a customer and business focus helps establish that SGML is the appropriate foundation for your organization's information products and services, go for it.

In summary, to make a successful migration to SGML:

Passage Home Page [About Passage] [Online Services] [What's New] [Products] [Customers] [Partners] [Employment] [Contact Info.] [Courses] [Consulting] Comments to webmaster@passage.com
This page last updated 6/19/96.
Copyright© 1996 Passage Systems Inc.