[This local archive copy (text only) mirrored from: http://www.agave.com/html/newsworthy/news_nortel.htm; see the canonical version of the document.]
"How Electronic Publishing at Northern Telecom Radically Improved Document Quality and Reduced Information Time-To-Market" |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Electronic Publishing Solutions department at Northern Telecom (Nortel) and Agave Software Design Inc. transformed product and price publications from paper to electronic media within a short period of time. Electronic publishing radically improved Nortel's ability to control document quality and reduce information time-to-market. This department incorporated many significant production changes, such as:
Nortel's previous publication production methods required the use of word processors to replicate and edit large product documents. Document publication was dependent on manual entry via word processors across several departments. Data entry errors and constantly shifting page layout due to changes, updates, and deletions created a vicious cycle of self-generated re-work and ever expanding schedules. Generally, information accuracy and update timeliness prevented consistent publication and use of resultant publications. Publication is now produced directly from an SQL database source using SGML with embedded SQL statements. Both the source and the resultant documents are true SGML documents compliant to ISO 8879 standards. These SGML documents were created without modification of the legacy database. Replacing the existing database structure was not an option because it would have required re-engineering all of the existing processes that use the database. However, by using an internally developed toolset that expands SGML with embedded SQL statements, Nortel is able to produce SGML documents from legacy databases. These embedded SQL queries produce variable-length documents on-the-fly for printing or for display by the common Internet or CD-ROM browser. Today, using an Internet or CD-ROM browser, Nortel's marketing and production engineers, sales support staff, distribution managers, and external distributors and customers can immediately access accurate product and price information. In addition, on-line access enables users to query and generate live reports dynamically from legacy information so that they can further target desired information. Information is kept up-to-date in an Automated Price Action application that is accessible on the Internet. Product adjustments are introduced for approval via this Internet service, and once approved, changes to product and price databases become instantaneously available for use. Although paper publishing is still required, Nortel anticipates substantial savings in time, labor, and cost by using SGML in a unique way. Background This document describes the migration of Nortel's Product Catalog from a manual publishing process to an electronic publishing process. When the migration process started, over two and a half years ago, Nortel's electronic publishing solutions (EPS) department was a small department of two publishing specialists responsible for publication of a single product line twice a year. The EPS department worked closely with the organization that was responsible for maintaining pricing information in the product database on a mainframe. At the time, product descriptions and page layout were maintained by manually editing desktop publishing documents that consisted of multiple files and pages. To publish the catalog, the EPS department would manually update the document with price information extracted from the product database. Refer to Figure 1, Old Publishing Process. Overall catalog publication was inefficient. The actual publication process was conducted by the department over a two-week period of excessive overtime due to the cyclical nature of the work. Since publication was the last step in a long chain of events, it did not begin until information was available from other departments and was dependent on, among other things, proofing the product database to ensure price accuracy. Once price information was available, the EP department took over, literally working night and day until the catalog was published. The migration from a manual to an electronic publication process was executed in the interval between publications by the EPS department in conjunction with Agave Software Design (Agave). Agave augmented the department's publishing specialists by providing knowledge of database technology and programming skills. The publishing specialist, in turn, provided corporate business, product knowledge, and publishing skills. The team initially explored many different ways to publish the price catalog, some of which did not include SGML. Ultimately, SGML won out as the best method for publishing the catalogs electronically. However, integration with a non-SGML data source, specifically the legacy parts and price database, provided a significant challenge that required the development of a unique tool named SQml by Agave Software Design. Beginning in 1996, Nortel's product catalog was distributed for the first time electronically on CD-ROM and on Nortel's Intranet. Overall, electronic publishing has provided reductions in labor, materials, and time over the manual publication method. The EPS department's responsibilities grew over the migration period to include the publication of several additional product lines, comprising approximately 5,000 products, and responsibility for the development and implementation of a cohesive product database called the Voyager Database. The EPS department now does very little direct authoring of information. The vast majority of the information published is generated directly from the Voyager database. A typical product line catalog is composed of multiple books and is published 2 to 4 times each year. Prototype Database Publishing In 1992 Nortel was using a product database (Parts Price) as the pricing database that also fed other information systems within Nortel. One such system was the product catalog publishing process. At the time, all catalogs were published manually by updating existing catalogs with a desktop publishing editor. The process was error prone, and could result in contradictions in the published product catalog. For example, the product catalog was published in several different formats: a Price List, a Reference, and a Summary of Changes. Each component of the publication might contain the same information presented in a different manner. Small errors could result in serious contradictions in the Product Catalog. By late 1992 the publishing department began to experiment with methods to automate the publication process. The first attempt explored the use of a desktop publishing/Lisp database initially loaded with a small portion of the product database. This demonstration proved successful in clearing up the data contradiction problem since each data element in the catalog was actually a reference to information in the Lisp database. As long as the references were correctly maintained, a document would not contradict another document using the same references. Although the desktop publishing/Lisp solution looked promising on a small demonstration scale, when the entire price database was loaded onto the desktop publishing/Lisp database, performance became a serious problem. Updating a single reference took minutes, attempts to make wholesale changes in the document format required re-authoring the entire document, and attempts to periodically feed the Lisp database from the product database were problematic. During the same time, other departments within Nortel were experimenting with database publishing using other publishing products and an SQL database. Several small applications had been built to publish camera-ready documents from databases. Members of Agave were involved in developing those early applications and were confident that the concept could be used with the larger legacy product database on the mainframe to improve the situation. The EPS department and Agave began work immediately on a prototype system, moving the product database to a local SQL database and publishing a document. Within two long days a working demonstration was completed. Although the prototype system was not used for that particular catalog publication, the concept demonstrated success over the Lisp solution and was selected as the solution of choice. The Lisp migration effort was canceled and Agave immediately began to refine the database publishing solution for the next publication. As an added advantage, Agave's prototype allowed editing of product catalog information to begin much earlier in the process. Because the data came directly from the database, proofing of the database contents could begin immediately after the previous publication was complete. This allowed the EPS department to level the workload across all available time and eliminate the two-week crunch period of all-nighters leading up to the eve of publication. First Generation Database Publishing, Non-SGML Migration from the Mainframe Database The first implementation of the database publishing system was built on the concepts proven by the prototype system. A data feed taken from the mainframe product database placed parts price information in a local database on a desktop machine. Programs written in C generated documents in a proprietary desktop publishing format. These documents were subsequently imported into a template document which contained format definitions, such as font, table, and layout definitions. This solution was not optimal and required significant effort to maintain. Data extraction from the mainframe product database was cumbersome, and information often required manual editing before being loaded into the local database. In addition, Nortel had made a corporate decision to begin migrating all mainframe databases to client-server systems, but had not identified the replacement systems. Plans to extract the data directly from the mainframe database were put on hold. The EPS department and Agave saw this as an opportunity to correct several problems at once. They chose a UNIX SQL database as the primary product information database and took steps to transition information off of the mainframe product database. Over a period of several months between publications, an image of the mainframe Parts Price database was recreated in the new database product. To minimize the pain of the transition, the structure of the mainframe product database was duplicated. The mainframe Parts Price system has been retired. The Problems Encountered by Early Success When news spread about the new publication process and the speed and flexibility with which new documents could be published, an ever increasing set of new document requirements began to pour in. Suddenly, a multitude of derivative documents were created to satisfy demand. Nortel began migrating other product lines onto the system. Within a matter of months, approximately 100 different documents were published, each requiring a unique program. Many of the source files differed by only a couple of characters. The team found itself with a serious configuration management problem on its hands. These first generation C applications used to generate files were not elegant. In the beginning, only four or five documents were needed and publication of each used a simple C program to generate the document. SQL statements that performed data filtering were embedded into the source files along with document structure and format information. As a result, format or content changes to any document required programming skills to edit the C source documents. Refer to Figure 2, First Generation Database Publishing. Two main issues surfaced that required taking the system back to the drawing board. In its present form, new documents or modifications to existing documents required that department members have programming skills. Many documents were derivatives of a parent document, which meant that changes to a parent might also require changes to its children. Although many of the department's specialists knew SQL (they worked with the SQL database several hours a day), they were not C programmers. Far more frustrating than requiring programming skills was ensuring that all documents in a family were kept uniform to changes in the parent. As the requirements for new or one-off documents increased, a document's family tree appeared to grow exponentially, making managing changes more difficult. No simple solution availed itself immediately. Two short-term work-arounds were taken: Agave enhanced document source program flexibility so that a single program could generate a number of documents, and in parallel, the EPS department began converging the various product lines onto common document format, thereby reducing the number of document variations. It was clear that this was only a short-term solution. Document owners would soon require a flexible system to alter the structure and content of the document independently of a programmer. Other Media Enter the Fray (Just in Time) Like many other companies, Nortel was beginning to explore the cost-saving advantages of CD-ROM over paper publications. The EPS department was directed to offer the Product Catalog in CD-ROM format. After the department completed a serious evaluation of available solutions, they selected a browser technology using SGML. The browser could run on all of the target platforms required and could also import the generated documents. Both the electronic publishing specialists and Agave's programmers took classes to learn SGML. In particular, they needed to learn how to structure new document formats to be able to generate documents directly in native SGML for the CD-ROM browser. Also looming on the horizon was the near-term requirement to make the Product Catalog accessible via World Wide Web (WWW) technology. Second Generation: SGML and the Development of SQml Once the team realized the power of SGML through a preliminary evaluation, SGML appeared to be a long-term solution to many of the problems in electronic publishing encountered thus far. Documents in SGML could be published on CD-ROM, the WWW, and paper. SQL query markup could be included in the SGML source, allowing an SGML editor to edit the documents directly without requiring a programmer. In addition, SGML was an open standard. To upgrade to SGML and produce the second-generation system, the EPS department and Agave decided on new processes. Key migration requirements were as follows:
Of these requirements, the extraction of database information using SQL queries appeared to pose the most challenge because SGML alone does not support this operation. If special programs had to be written, it might jeopardize the ability of publishing specialists to maintain source documents without programmers. This was a significant factor that had to be resolved. The second task in the upgrade process was to select the tools to implement and maintain electronic publications in SGML. Four types of tools were identified initially:
SGML Editor A non-SGML desktop publishing tool was selected by Nortel as the corporate standard. The EPS department had been using it for the last two years, and found it a more than adequate non-SGML publishing solution. However, could SGML documents be maintained by a proprietary non-SGML editor? By coincidence, Nortel was also a beta site selected by the desktop publishing tool manufacturer for its new SGML product. The team requested a copy and began testing different document prototypes and approaches. One of the requirements tested was the ability to load some of our existing DTDs designed for the CD-ROM browser and a prototype WWW application. Neither set of DTDs loaded at first. After several attempts, the CD-ROM DTD loaded successfully. However, HTML-3 DTD used for the WWW prototype continued to pose problems. The main cause of difficulty in loading DTDs was the implementation of a filtered editor versus a native editor. The filtered editor layered SGML functionality into its proprietary structured document type with filters to import and export documents and DTDs. The team selected a native editor to support electronic publishing in SGML and eliminated the need for a filtered editor. CD-ROM Browser The CD-ROM browser was actually selected prior to the decision to migrate to SGML. Nortel selected the browser for its ability to support all targeted platforms (PC, Mac, UNIX) and document types, and for its prior history of good performance throughout Nortel. The browser performed well and proved to be flexible and easy to work with. If there were a complaint with the browser, it would be that it was too flexible and as a result was a bit large and took a while to load. Standard Document Style Sheets Finding a document style standard common to the CD-ROM browser, Internet browser, and paper publication posed a serious problem. The team could not identify a single style standard to support all three types of publications and ultimately had to use three different style types. The issue is best exemplified by counting the number of styles used today to support the various HTML browsers on the Internet, a very common problem. We anticipate this issue to remain a challenge for the foreseeable future. As a result, publication of the various formats desired may require continued management of several different style specifications. Data Extraction Database extraction turned out to be one of the biggest challenges to using SGML: how to obtain access to source information within the SQL database using SGML? The team realized that we required a tool that we did not have and that was not readily available at the time. To proceed, the team developed the following requirements for the missing tool:
We did not have the time to build an editor. Devising a solution where the source document must be edited in a text editor and validated separately was a step backwards and would be more difficult to use than the non-SGML implementation. We wanted a true document mark-up solution and resisted the temptation to fall back on programmer support to maintain documents.
SGML databases are abundant and appear to be sound products; however, Nortel had already made a sizable investment in migration to the existing product databases and tools and was not yet ready for a radical departure. Additionally, SGML databases appear to be authoring tools for the most part, and the EPS department was not in the business of authoring documents.
The use of SQL seemed unavoidable. Even with all of its detractors, it is still a standard and the EPS department was already familiar with it. The team wanted to be sure that the tool did not require additional programming expertise to be operated successfully.
The tool should not require a dozen command line parameters to run. It should not be a multi-stage filter. It should simply allow the user to expand the document in a single operation, preferably from within the editor. Keep in mind that our expertise was in database publishing, not in SGML. It took a while before we understood SGML well enough to even begin testing concepts. After some SGML training and laboring through digesting some of SGML books, we realized: 1) the tool required to expand SQL queries within an SGML document did not exist; and 2) SGML was flexible enough to incorporate the functionality required without violating the standard. Realizing that one could extend SGML to incorporate embedded SQL queries seemed of general interest and of value as a method for publishing legacy data using SGML aside from the present challenge within Nortel. The team then took on the serious development effort to create the SQL expansion within SGML source documents. This cohesion by Agave of SGML and SQL lent itself well to the product name SQml. SQml and Iterative Elements in SGML Mapping a Relational Database (SQL) into SGML The first thing that became clear was that the task required mapping of relational data into a hierarchical database. SGML is hierarchical so it is often implemented in database form. SQL is a standard for expressing relationships and is therefore relational. Hierarchical and relational databases are cousins, but there are many distinct differences between them. Two different approaches would allow the mapping of the two database formats:
This section uses the following tables as examples to illustrate these approaches to mapping relational data into a hierarchical database: Product_line { product_line char(10), product_line_desc char(80), sales_phone_num char(12) } Product { product_name char(20), product_line char(10), product_desc char(80)} and the following (simple) document DTD fragment: <! Element catalog - - (title, category) > <! Element prodline - - (#PCData) > <! Element product - - (prodname, proddesc) > <! Element prodname - - (#PCData)> <! Element proddesc - - (#PCData)> Repeating Fields Approach Using the repeating fields approach would take advantage of the natural join, sort on the key (product line), and watch for the product line value to change: select product_line, product_line_desc, product_name, product_desc from product_line, product where product_line.product_line = product.product_line and ...; The result of the query might look like: Product_line Product_line_desc Product_name Product_desc ------------ ------------------ ------------- ------------ Switches Telephone Switches Key system Small office Switches Telephone Switches PBX Office sys Switches Telephone Switches Large PBX Telco switch Switches Telephone Switches Carrier CO Switch Networking Network solutions CSU/DSU Digital Mod Networking Network solutions RTX-1 Router Networking Network solutions PC-Ether Ethernet brd Networking Network solutions M-Hub Small Hub Networking Network solutions Hub Sr. Large Hub Software Software Products Switch Boss Net Manager Software Software Products Super Browse Browser Notice how the column Product_Line repeats, even though there is only one association in the Product_Line_desc column for the entries: Switches, Networking, and Software. This is called a Cartesian join. It would be possible to look for the repetition and assign everything with Switches in the first column under the Switches <Prodline> hierarchy. This is a typical SQL query approach used by many SQL report writer applications. It is very fast. The drawback is it requires the query be sorted by keys and assumes the keys are unique. Nested Query Approach Using the nested query approach requires two queries: select product_line, product_line_desc into :prodline, :prodlinedesc from product_line where...; select product_name, product_desc into :prodname, :proddesc from product where product.product_line = :prodline and ...; The result of the query might look like:
The first query is embedded in the <Prodline> hierarchy while the second is embedded in the <Product> hierarchy. This approach is slightly slower. For every Product Line returned by the first query (Table 1), a separate second query (Table 2) is run. The end result is a relational/hierarchical map that is very clear and easy for a novice user to use and understand. Implementation of the Nested Query Approach We chose the nested query approach for the implementation of the SQml tool. Later versions of SQml may be expanded to make use of the repeating field approach if performance becomes a serious issue. Having decided on a hierarchical/relational mapping method, the next decision was how to describe this mapping in SGML. The obvious approach was to use processing elements as SQL holders. We realized that we would need to re-create the concept of the start-end element pair using the processing elements. Even then, the start-end pair would not be enforced by the editor. The same problem occurred when we considered using comments. We finally settled on using elements. The drawback to using elements was adding elements to a document required modifications to the DTD. However, these modifications were so minor, that this was not a significant concern. Why using elements is better is discussed in the following Iterative Elements section. This first implementation of embedded SQL used three primary elements: SQLCONNECT, SQLRECORD, and SQLCURSOR. The SQLCONNECT element specifies the userid, password, and database to connect to and retrieve data from: <SQLCONNECT USER="username" PASS="password" DATABASE="dbdescription"> The SQLRECORD performs a simple query replacement of the element. For example: <BODY><SQLRECORD TEXT="select product_name from product"></BODY> might become: <BODY>Key system PBX Large PBX Carrier CSU/DSU RTX-1 PC-Ether M-Hub Hub Sr. Switch Boss Super Browser </BODY> Although the first two element types were useful, and required, they lacked the utility of the relational/hierarchical data mapping scheme as described earlier. That would require the third element, SQLCURSOR, and some special functionality. SQLCURSOR is syntactically similar to SQLRECORD: <SQLCURSOR TEXT="select product_name into :prodname from product"> </SQLCURSOR> The difference is how SQLCURSOR is treated by the parser. It is an Iterative Element. Iterative Elements An Iterative Element (start-end pair), as used in SQml, specifies a block of SGML that is repeated a number of times. In this case the block is repeated once for each record returned by the query. In addition, Iterative Elements can be nested to provide hierarchical/relational mapping. The following (simplistic) example may be used to generate a book several hundred pages in length. There are no predetermined document size limitations of the SQmlTM tool or technique. <TITLE>Price Book</TITLE> <SQLCONNECT USER="user" PASSWORD="passwd"> <BODY> <SQLCURSOR TEXT="select product_line, product_line_desc into :prodline, :prodlinedesc from product_line" <PRODLINE>&prodline;</PRODLINE> <PRODLINEDESC>&prodlinedesc;</PRODLINEDESC> <SQLCURSOR TEXT="select product_name, product_desc into :prodname, :proddesc from product where product.product_line = :prodline" <PRODNAME>&prodname;</PRODNAME> <PRODDESC>&proddesc;</PRODDESC> </SQLCURSOR> </SQLCURSOR> </BODY> Other noteworthy issues: SQml elements are SGML elements in appearance and use. SQml elements have meaning only to the SQml parser and are normally discarded in the processing step. This should not pose a problem for most common SGML editors. Note the use of general entities as place holders for the column values returned from the queries. SQml Implementation SQml is implemented as an SGML processor that produces a destination SGML document from the SGML source and DTD. With SQml implemented as a processor, users embed SQL queries within SGML documents in SQml form using an SGML (or standard text) editor. Programming skills are not required! Of course, knowledge of SQL is required to achieve effective document integration with SQL databases. After authoring, the SGML document is processed by an SQml parser to produce the expanded SGML document. The first released version of SQml used James Clark's SP Parser library. Previous versions used UNIX Regular Expressions, which proved to be slow and virtually unable to detect syntax errors. SQml supports access to multiple databases and tables through Open Database Connectivity (ODBC) calls. ODBC provides generic access to many popular SQL databases, is the PC standard for database communication, and is fast becoming the standard for UNIX systems. ODBC was selected to provide platform and database neutrality. As a direct result, SQml runs on HP-UX, Solaris, Windows 3.1 / 95, and Windows NT, and connects to most popular SQL databases. Iterative Elements Use SQml and Iterative Elements may not be appropriate to solve all database access situations. There are a few guidelines to use to identify when an access problem is best solved by SQml or Iterative Element methods. In general, SQml can be used to retrieve information for any SGML or HTML document that relies on structured data contained in an SQL database.
That is, the resulting document information that is retrieved from external sources (in this case the SQL database) cannot be authored at the source document level. Since SQml expands the embedded SQL queries (within the source document), information resulting from SQml expansion can be modified only in the destination document. Interesting Points about Iterative Elements
Authoring and Re-Authoring Now, with all the publishing tools in place and the means to extract product information from the product database, the EPS department could get back to the business of publishing product catalogs. For the most part, all the hard tasks had been completed except one, product information authoring. In the previous publication process, product information was generated by product managers who forwarded information to the EPS department's publishing specialists for entry into the product catalog. In most instances, the publishing specialist would re-key information for inclusion in the product catalog. As previously noted, however, the EPS department was given the responsibility of additional products, product lines, and catalogs when it was recognized that the new publishing method was more efficient that the previous process. This created an enormous workload for the department. In addition, the publication specialists of the EPS department could not be experts on every product. TM"> To solve this problem, individual product organizations were given control and authority over the portions of the product database for which they were responsible. Product and price changes could be generated directly by product managers and their staffs. Once product changes were agreed upon, they were staged into the database for approval and processing by the EPS department. In addition, by using SQmlTM, the product manager could get immediate feedback by generating a copy of the product catalog as it would appear with his/her changes. As a direct result, the quality of the product catalogs increased because the product information was maintained by those closest to the products themselves. Refer to Figure 3, Second Generation: Publication with SGML and SQml. Making product information the responsibility of the product managers helped to eliminate re-authoring of vital information. Now, because all source information necessary to publish the product catalog was resident within the product database, publishing a product catalog was reduced to a three-step process:
Summary Today, Nortel has timely access to mission critical product information, in part, through the use of SGML, SQml, and the dedication of the EPS and Agave teams. Product information systems and catalog publishing processes have undergone substantial changes that have resulted in higher document quality and usability. These improvements also provide benefits in lower maintenance and product costs. Strategically, the switch to SGML was the correct move to make at a critical point in the migration process. At Nortel, SGML and the internally developed SQml provides a complete environment that bridges the gap between the company's legacy repository of information and the targeted publishing media. Along the way, several lessons have been learned. Most notable of these are:
What's Next The catalog publishing process in SGML with the internally developed SQml database integration tools met the team's and Nortel's intermediate goals to provide product catalogs on paper, CD-ROM, and Nortel's Intranet. In the near future, WWW Internet access will be provided. However, there is more to providing Internet access than providing access to the documents currently available. Product catalog accessibility over the Internet might pose challenges in the areas of performance and usability. Because the speed of Internet access varies across the network, waiting time for large catalog file downloads may appear slow to some users. In addition, it will be an inefficient use of Internet bandwidth for a user that is only interested in a small portion of a product line catalog. To address such needs, Agave has developed a JDBC Server to be used in conjunction with an SQml Client and Java Applets. Use of these tools together can provide enhanced interactive capabilities. Internet users with Java-enabled browsers can run local applets that perform smart sessions with the product database, allowing users to quickly find information of interest. Form driven applets will help the user focus an interactive query on only the information desired, which in turn reduces the amount of information downloaded. Additional Information Additional information or interaction with the authors can be obtained as follows:
Janet Hyatt, Manager of Electronic Publishing
Solutions, Northern Telecom Ltd. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© copyright 1997 Agave Software Design, Inc. - All rights reserved. |