[Mirrored from: http://stdsbbs.ieee.org/aboutSPA/handbook/SPAovue.html]

SPAsystem overview

Jay Iorio
June 1994

The Standards Process Automation System (SPAsystem(tm)) is a program to computerize the standards-development and -dissemination process. Because the community of standards developers and users is diverse, in terms of both equipment and needs, the SPAsystem is intended to provide a range of services that can be used in many different ways. Development of the SPAsystem is ongoing, and the current capabilities constitute a flexible basis for services to be added over the next few years.

In order to explain what the SPAsystem is, it is useful to summarize the thinking that led to its conception.

The IEEE standardization process involves many thousands of volunteers, organized into Working Groups, who collaboratively write standards. These volunteers use the entire spectrum of computer hardware, and some do not use computers at all. Some have Internet electronic-mail (e-mail) access, some have full Internet access, and some have none. Some use modems, and some do not. The entire spectrum of word-processing and document-creation software is represented.

On the other end of the process are the users of IEEE Standards -- an equally diverse community. This group, also numbering many thousands, has the pressing need to access the body of IEEE Standards in a truly electronic form. Users have expressed interest in optical media, networked databases, and various other delivery forms. Many users also (or instead) prefer paper.

As a final complexity, the electronic files from the Working Groups must be handled by IEEE staff Project Editors, who, after approval of the document by the IEEE Standards Board, return the complete electronic files to the Working Group so that work can begin on the next iteration of the Standard.

The set of IEEE Standards, as well as the associated administrative and procedural information, constitutes an interrelated, cross-referenced, highly structured body of information -- ideally suited to a fully electronic treatment.

One of the strengths of the IEEE standardization process is its openness, and so the IEEE cannot present any technological obstacles to participation. And because the creators of the information are volunteers, it is impossible and inappropriate for demands to be made on the authors with regard to computer and network capabilities.

Three major challenges emerge from these facts.

First, an electronic file of an IEEE Standard made available to users must contain, at a minimum, the same information as the paper version. Otherwise, it is not "official" in a legal sense, and users will eschew it for the less flexible but more complete hard copy. Furthermore, it would be a shame to go through the huge effort of computerizing the process and its results without taking advantage of the information enhancements possible only in the electronic realm. Electronic page images, for example, would technically "make IEEE Standards available electronically," but would shortchange users who are interested in applying modern searching and navigation techniques to this body of information, not in having a picture of a printed page on their screens. The spirit of the law, so to speak, would be violated.

The second challenge is that contemporary document-creation software is almost invariably designed to produce a printout. From an information standpoint, today's word processors and page-design programs are electronic tools only in a limited sense. All software "marks up" a file with added information; at issue is the nature of that information. Industry observers often wonder why the "paperless office" has become a laughably elusive goal, but there is no mystery to it: the software we use to create information is designed to control a printer. The proliferation of fast laser printers has, in effect, confirmed the assumptions of the software. As many have noted, there is more paper now than ever. In a sense, computers have been used predominantly to undermine their true potential as univeral information appliances.

The third challenge is, given the broad spectrum of hardware and software used by the creators and users of the information, how does one add useful electronic value to the documents without regard to platform? Plain-text (ASCII) files are universal, but cannot convey the full content even of hard copy, much less the added value users demand. Facsimile pages capture all the hard-copy information but nothing else. PostScript and other delivery formats provide the ability to add platform-independent rendering (printer or screen) information to a file, which solves the compatibility problem but ignores the need for truly electronic added value. Proprietary hypertext and electronic- publishing software might provide the desired added value but without cross-platform compatibility. Yet without a single electronic file that can be exploited by any user with any equipment, a standards-developing organization is obligated to produce not only a paper version but also multiple electronic versions of every title, which is economically unfeasible.

So, in summary, an electronic version of an IEEE Standard must have added value beyond what paper can convey, that added value must be platform- and vendor-independent, and it must be significantly different from that added by most contemporary software. In addition, no changes in terms of software and additional labor can be demanded of those who create the information. Finally, no currently available services or products can be eliminated (hard copy, for example).

There is a solution to this set of challenges -- actually, a set of solutions. The SPAsystem, most accurately portrayed, is this set of solutions.

Central to the SPAsystem is the philosophy of open systems. This phrase means many things to many people, and this article is not intended to be a tract on open systems. The general point, for our purposes, is that there are ways to modularize the various components of a complex computer system to enable a high level of interoperability among computer equipment from different manufacturers. Standardized formats and protocols are public, and meeting certain design criteria will assure interoperability. One of the key principles of the SPAsystem is that the data -- the Standards themselves, as well as associated information -- be decoupled both from the platform and from specific software vendors.

One open-systems means to meet this goal, and to solve the three major challenges just described, is to adopt the Standard Generalized Markup Language (SGML, ISO 8879 -- 1986) and related consensus standards as the way to encode the data. In short, SGML is a language designed to describe (and enforce) the structure of information. To use SGML, one creates a file called a Document Type Definition, which is a formalized description of a specific type of document -- a memo, a letter, a novel, a draft standard, and so forth. The DTD defines each "element" (a title, a paragraph, an ordered list, etc.) that can (or must) occur in a document type, gives each element a name, specifies how it may and may not interact with other elements, and defines any characteristics that element may or may not have. When a document is created, the writer assigns to it a specific DTD (use the "memo" DTD for writing a memo, etc.). SGML editing software, which exists for all computer types, "learns" from the DTD associated with a document what is and what is not possible at every juncture in the writing process, thereby assuring a structurally correct result as well as offering an authoring aid -- the electronic embodiment of a style guide.

Underscoring this emphasis on structure, SGML separates information from the way(s) it might be rendered on a screen or printer. That is, much like a relational database, the information has no intrinsic visual existence; rather, how various pieces of information will be rendered in various situations is a matter separate from the underlying information itself. For purposes of illustration, imagine if a corporate database user were to request the IS Department to provide a printout of the customer database. The database administrator would likely respond with the look of a mathematician who has been asked to divide by zero. A printout? Of which piece? In what form? Indexed and organized how? The database, huge and liquid, has no specific visual form. If the user were to request instead a report containing a list of persons from California who had purchased a widget in the last six months, then a printout becomes a reasonable request. The database-management software will respond to such a query by creating a subset of the total database based on the stated criteria, alphabetize by last name of the customer, and print (or display) the results as a useful list. This information model, although quite different in many ways from the needs of a database involving full text, graphics, and other data types, nonetheless illustrates the general shape of a useful approach to structured data of any type. With this analogy in mind, SGML provides a practical way to achieve this kind of functionality for the kind of information we are concerned with. Not only should a user of IEEE Standards be able to look at, say, IEEE Std 802.3-1990 as a discrete unit of information, but this user should also be able to query this massive body of information much as one would a relational database: "show me a list of all standards since 1990 that deal with local-area networks," or "I am designing a power plant. Which IEEE documents are germane to my needs?" There are many in the standards community who are convinced that wider adoption of consensus standards might well be encouraged by making potential users aware that many of their needs have already been addressed by the community of experts in the field. That is, often people are not aware that they could benefit from adopting standards, and a multifaceted means of access to the relevant information could bring many new users into the community, to the benefit of all.

From the writer's point of view, the results of SGML-aided authoring are far-reaching. First, unbeknownst to the author, he or she in effect has added automatically to a database instead of creating an atomized parcel of information. Second, via SGML and related standards, platform- independent hypertext links (as well as video, audio, and other data types) can be added to the mix, thereby radically increasing the value and utility of the information. Third, the author can focus on the content, appropriately, and let the machinery take care of the rest. And finally, having created a rich, neutral data file with electronically useful added value, less complex formats for specific purposes -- database browsing, printout, etc. -- are created relatively easily; complex-to-simple translations can be made automatic, while simple-to- complex translations are harder and almost invariably require human intervention.

Adherence to consensus standards for platform-independent and vendor- neutral data formats -- including data types that will come on line in the future -- is the best insurance that the data will remain usable for a long time to come. Since these standards are written with respect to each other, adherence to the suite of related standards ensures both durability of data and the ability to take advantage of technological advances much more quickly than would be possible if the data were emtombed in this year's preferred proprietary format. This is not to say that users might not want to have IEEE Standards delivered in any number of formats; the point is that with a rich SGML database as the source, other formats can be derived relatively easily.

So the vision of the ideal state of IEEE Standards and related information is a full-text, multimedia database that is amenable to a variety of navigation and searching techniques, a database whose structure mimics the process of standards development very closely and affords the user a window into the entire standardization process rather than a fixed view of the printed-out labors of a particular Working Group. Such a database will be available to all computer users, running any of today's (or tomorrow's) equipment. It will provide users with an interactive, modern, graphical interface to the information, again without regard to the hardware and software specifics. In short, it will provide users with everything they have come to expect from state- of-the-art proprietary products, but without the problems and exclusivity of proprietary systems and without reducing the experience to a primitive character-based terminal session on a mainframe. This database would be available via dial-in phone lines, the Internet, hard copy, or a variety of electronic formats, either downloadable or supplied on physical media of some sort.

The SPAsystem staff members have been participating actively in international efforts launched by standards-developing organizations to achieve consensus as to how these goals should be achieved. There is a requirement on the part of many, if not most, users of standards that there not be artificial boundaries among the databases of the various SDOs. Users tell us repeatedly that they need a single and straightforward means by which to access every organization's information. Rarely does a person use only IEEE Standards, for example. Documents from other SDOs, governments, and other information providers must appear integrated to the user, and queries must be possible over the entire body of information. In addition, there is no way (or reason) to have all this information in the same physical location. The technical specifics of the SPAsystem are beyond the scope of this description, but it is worth mentioning that these system requirements necessitate a distributed architecture, where pieces of a huge interrelated database exist in an arbitrary number of physical locations and are integrated (to the user's eyes) through software. A user should be able to call one number, so to speak, generate a query against the entire body of SGML-encoded information, and receive the information relevant to the query in response. This approach is desirable for a range of reasons, not the least of which is that by adhering to an agree-upon set of protocols, formats, and procedures, any information provider can, in effect, join the party without a great deal of trouble. This harmonization of approach to electronic standards development is the key to a truly useful system for the spectrum of standards developers and users. The SPAsystem staff are focusing on what we consider the missing link in this SDO information flow: how to get the information into the ideal electronic form.

The SPAsystem hinges on the idea that the best time to add value to an electronic file is when it is created. If there should be a hypertext link from one paragraph to another, or from a footnote in one book to a graphic in another, then this information should be added as part of the writing process. Under the most straightforward scenario, all authors would use SGML text-editing software, use the DTDs created by IEEE staff, create their documents off-line, and send the results to one of the IEEE computers according to certain procedures. Someday, this might come to pass. But today's reality is that the vast majority of standards writers use popular word-processing software whose native output, for reasons described earlier, is unsuited to the larger task at hand.

Therefore, the SPAsystem staff have focused much energy on resculpting the authoring process so that certain criteria can be met:

Meeting these criteria has taken many forms, including setting up conversion routines and creating procedures for writing a standard or piece thereof. As it turns out, it is the printer commands inserted by every word-processor user as a matter of course that provide the means to meet the criteria. Authors already have been adding value to their text. With procedures and conversion programs, this limited and proprietary value can be replaced with more universal and enduring value while still allowing for formatted printing. IEEE staff will provide Working Groups with procedures for their specific word processors that will insert (without their explicitly knowing) markup cues in the text that will give them a correct printout and provide the SPAsystem with information automatically mappable to the database world. Staff Project Editors will be involved with Working Groups from their project's inception, assuring the integrity of the information flow. As these procedures (which, as of this writing, are under test) are refined with input from Working Groups, they will ultimately constitute a universal "authoring machine" that automatically produces flexible electronic database files from any number of less "intelligent" sources. When a document is approved as an IEEE Standard, a flag will change in the associated relational database, the system will then recognize this information as publicly available, and users will immediately have access to the new standard, not only as an isolated document but also as an integrated piece of a much larger body of information.

If the first half of the SPAsystem is the form of information, the second half is how to get to it. Networks play a crucial role in the SPAsystem and its future. For standards developers, the benefits of basic network/phone capabilities -- high-speed file transfer, e-mail, bulletin-board services -- are undeniable. In fact, even as high-speed networks are being extended universally, these basic operations -- in some guise -- will probably remain central to a standards developer's life. Even in a future that contains three-dimensional immersion conferencing systems, files would still be moved around.

For the user of standards, networks are an inevitable part of the future. Because the collection of information is potentially so large, a user restricted to physical media might often be in the position of having access to everything except what is needed -- cross-references that go nowhere, for example, next to megabytes of unneeded information. Furthermore, the dynamic quality of the database cannot be reflected by periodic CD-ROM updates. A database available by telephone and Internet offers the user access to the full body of information, always up to date, without having to pay for unneeded bytes.

So this is the planned shape of the SPAsystem as it evolves over the next three years. The focus is on getting the information into a maximally useful form, by a variety of means and as automatically as possible, with the awareness that such a form will provide users with whatever view into this body of information they happen to need at the moment.

Because of the SPAsystem's emphasis on information flow and form rather than on specific machinery, the system's specifics will change constantly as technologies advance, but the underlying terrain will prevail. This is perhaps the SPAsystem's most central goal: the complete severance of information from equipment, in order to maximize use of both.

The major (following) portion of this Handbook is a guide to currently available services. As the SPAsystem evolves, new versions of the Handbook will be written. Authoring services that will assure consistent SGML input to the standardization process are scheduled to be available to all Working Groups by the middle of 1995, and the fruits of these labors, in the form of standards available in various electronic forms, will arrive about a year after that. Over time, what is described in this article will be mirrored in the Access & Services section of this Handbook, in the form of specific instructions.