An Investigation Into The Role of SGML In An Electronic Forms Environment

A Study Prepared for the

Treasury Board Secretariat


Microstar Software Limited

With Observations and Recommendations

Formulated by the

Project Advisory Group

March 31, 1995

Executive Summary

This project is part of an on-going effort of the Treasury Board Information Technology Standards (TBITS)program to define and to promote a standards-based electronic document environment that fulfills government policies on information technology and information management.

The objective of the project was to investigate the potential for developing an electronic forms specification, processing and interchange environment which would comply with applicable international standards - ISO 8879, the Standard Generalized Markup Language (SGML) and ISO 10744, the Hypermedia and Time Based Structuring Language (HyTime). Since standards facilitate intersystem communications and forms data interchange, the investigation summarized the inherent characteristics of open systems and SGML in order to contrast the standards-based approach with more traditional automated forms systems. To meet the needs of a complete electronic forms environment, it was assumed that the impact of SGML and HyTime had to be considered with respect to: forms design; edit rule specification; forms filling; forms printing; forms interchange, and; forms information storage.

Since implementation of electronic forms should not be regarded simply as an exercise in replacing paper forms with their electronic counterparts, this project used a formal methodology and approach to investigate the electronic forms environment. A more comprehensive approach was used to analyze the semantics (or "meaning") of the data captured within forms, as well as to understand how users see, access, use and manage the form's contents. The Document Development Life Cycle (DDLC) framework provided the context for examining the roles and responsibilities that contribute to efficient and effective use of electronic forms based on SGML.

This project examined some of the unique properties that the SGML and HyTime standards can contribute to an electronic forms environment. It included an analysis of the Government of Canada Training Application and Authorization form within a SGML demonstration implementation. The demonstration implementation consisted of multiple SGML products running on heterogeneous hardware platforms connected by several telecommunications networks to illustrate that an electronic forms environment can be supported through standards-based technology.

The investigation results indicate that a formal approach to electronic forms implementation offers opportunities for rationalizing a form's contents and structure based on the meaning of the data. In addition, available SGML-compliant products can support, to varying degrees and effectiveness, the requirements for forms design, forms filling, forms data formatting and distribution as well as forms data storage and management.

In light of the demonstration environment and the contracted findings, the Project Advisory Group made a number of observations and recommendations regarding additional initiatives which could be undertaken to address issues that fell outside the scope of this study. In conclusion, the vendor, platform and network independent environment, assembled for this demonstration, implies sufficient benefits to justify a joint government private sector pilot implementation of an SGML-compliant electronic forms project.

1.0 Introduction

This section provides a context for the entire report by stating the project objectives; by indicating the government standards background; by giving the terms of reference and; identifying the Project Advisory Group members.

1.1 Project Objectives

The objectives of this project are to investigate and demonstrate the potential for developing an electronic forms specification, processing and interchange environment based on ISO 8879, the Standard Generalized Markup Language (SGML) and ISO 10744, the Hypermedia and Time Based Structuring Language (HyTime). Such an environment would include a forms design and edit rule specification mechanism, a forms filling application, a printing application and an interchange mechanism. This project is focused on the facilities that SGML-based technology brings to the complete solution, does not attempt to define or address all requirements or issues associated with a full function, mature electronic forms implementation.

1.2 Project Background

This project builds on the results and represents a part of an on-going effort of the Treasury Board Information Technology Standards (TBITS) program to define and to promote a standards-based electronic document environment that fulfills government policies on information technology and information management. Facilitating the implementation of SGML within government departments is a major responsibility of the Electronic Document Standards Working Group (EDSWG), which was established under the TBITS mandate to define and foster implementation of government approved standards. Electronic forms represent one category of documents that need to be standardized and integrated with other types of government information in order to achieve seamless access to information as envisaged in the Blueprint for Renewing Government Services Using Information Technology.

More specifically, this project provides an opportunity to demonstrate how forms-based applications could be renewed and re-engineered to deliver government services through the innovative use of standards-compliant information technology.

1.3 Project Terms of Reference

This project comprises a "Proof of Concept" analysis and sample implementation of innovative technology. As such, its results will be useful for setting general directions and for isolating implementation issues that may arise. It is not designed to provide a detailed implementation plan.

The demonstration system configuration, which will be based on an analysis of two representative forms (1) and a knowledge of current electronic forms technology, is meant to illustrate the nature of processes that could be performed on forms data within an SGML-based environment. Although representative of forms processing, it is not meant to reflect a full-featured electronic forms user environment.

The project scope, methodology, demonstration system strategy, and findings were discussed with the Project Advisory Group, an EDSWG sub-group, to incorporate advice and reflect issues identified in earlier electronic forms studies and pilots undertaken by government departments.

(1) The Training Application and Authorization plus the Travel Expense Claim were used for this analysis Appendix A provides a detailed analysis of the training form.

1.4 Project Advisory Group Members

	Ben Bloommesteijn		TBS
	Robert Boissonneault		PWGSC
	Daniel Boucher			TC
	Ed Buchinski			TBS
	Martin Flood			TBS
	Nadene Grattan			RCMP
	Andrea Keele			DND
	Lorraine Léger		Canadian Heritage
	Andy Morgan			DND
	Lise Potvin			TBS
	Jean Pruneau			HC
	Michael Pyefinch		PWGSC
	Robert Rouleau			NatResCan
	Russell Thomas			NRC
	John Whelan			INAC

2.0 The Environment for Electronic Forms

Since electronic forms applications are relatively new, this section provides a brief definition of a forms application, identifies the relevance of forms in government systems and discusses the international standards approach to coding document contents. Whereas available options for specifying document content or markup have a major impact on forms design and data interchange, it is essential to understand the underlying principles of markup in order to appreciate the significance it has for government-wide handling of electronic forms.

2.1 The Forms Application Defined

The typical forms application is designed to provide a stable storage facility for selected information and to offer forms users a user-friendly interface through which they can enter, view and update forms information.

A forms application consists of a form layout or definition, the set of processes and procedures used to enter, validate and interchange the information contained on the form and copies of the filled-in forms (i.e. also referred to as an instance(s) of a form). Typically, the information on a form is well defined and highly structured - for example, codes are widely used to represent textual values that are displayed to the user. The information is also frequently organized in such a way that exceptional situations are highlighted for special attention and processing.

Forms applications usually involve more than one person, with each person contributing information, validating or revising existing content, authorizing a payment, a request or follow-up activity, etc. .

Historically forms applications were designed for a paper-environment and the paper paradigm is still evident in most of the automated forms systems that have been implemented thus far.

2.2 Forms in the Federal Government

The Federal Government makes extensive use of forms to support program delivery, to execute internal administrative functions, to gather information and to disseminate it to authorized users. Thousands of different forms are created annually and millions of individual instances of forms are distributed, collected, processed, filed and compiled. Many computer systems and software are in place for capturing the data on the forms and for managing that data. Each system uses a proprietary scheme for storing the information captured through forms applications (2) Although numerous forms have government-wide applicability, the Project Advisory Group observed that specialized departmental needs require customized forms to be used from time to time. These specialized needs when combined with vendor proprietary electronic forms solutions compromised the effectiveness and efficiency that could be realized in handling forms information on a government-wide scale.

(2) Treasury Board of Canada Secretariat. Administrative Branch. Forms Automation within the federal government. Sept., 1993.

2.3 International Standards and Open Systems

The International Organization for Standardization (ISO) publishes and maintains international standards for Information Technology as well as many other industry and technology areas, including document encoding and processing. These International Standards are voluntary and are developed through member country participation and balloting. They represent vendor-independent, open specifications that are typically adopted for government-wide use following departmental endorsement through the TBITS program.

Open systems are enabled by adherence to international standards. Users can achieve an open system environment by demanding vendor fidelity to standards specifications and choosing technology solutions to meet explicit user needs without compromising data interchange and system portability. In so doing, the user community can be protected from proprietary data encoding inherent in vendor software/hardware implementations and thereby ensure data usability beyond the commercial life span of individual vendors or their products.

Although arguments have been made to the contrary, vendor adherence to international standards does not inhibit feature differentiation in commercial products, but rather, enhances the competitive environment. Vendors can compete for "best of breed" status without sacrificing innovation, data interchange or system inter-connectivity capability. Government and industry has an opportunity to demonstrate its commitment to open systems policies by using applicable international standards within electronic forms-based applications. Specific areas within a forms environment to which standards could be applied include:

	- forms design;

- edit rule specification;

- forms filling;

- form printing;

- forms interchange, and;

- form information storage.

The international standard for SGML provides the potential to reduce costs, decrease processing time and increase the value of forms-based data. By using the descriptive markup capabilities of SGML, processing systems may be designed in a more generic fashion and may be deployed on a wider scale. This would result in cost reductions for both development and maintenance processes. More emphasis could be placed on information content rather than on procedural issues and thereby contribute to job enrichment opportunities. To illustrate how these benefits may be achieved, the following paragraphs describe the concepts of document coding or markup and document structuring that are supported by SGML.

2.4 Markup

Within the publishing industry, the term "markup" has traditionally been used to denote handwritten proofreaders' marks - comments or additions that were added by a proofreader to a manuscript. Collectively, these marks are referred to as the markup and they have been used primarily to provide page layout instructions to compositors. Used as a noun, "markup" has come to mean "additional information interspersed with the content information" that may be placed in an electronic document or on a piece of paper.

All electronic documents contain some form of markup. Documents that are created by word processing programs typically contain additional instructions that indicate formatting controls which are to be applied to the data content whenever it is presented on a screen or a piece of paper. This type of markup, which can contain a wide range of process control instructions, is referred to as Procedural Markup.

An alternate form of markup, named Descriptive Markup, is used to identify the semantics (i.e. the meaning or purpose ) of the content to which the markup is applied. Descriptive markup can be hierarchically structured, identifying those pieces of information that are components of other pieces of information. For example, a paragraph is contained within a section which in turn may be contained within a chapter. This hierarchy, or "tree structure" (also referred to as arborescence), is a top to bottom breakdown of a document into its constituent units.

The following paragraphs highlight the characteristics of systems that support each of these two types of markup and the ways these systems influence the processes and uses for which the encoded information is used.

When utilizing descriptive markup, procedural instructions (e.g.: display formatting commands) are normally applied by separate processes to the marked up information components. This allows the content that is descriptively marked up to be accessible, validatable, reusable and variably processable.

While typical word processing and desk top publishing systems utilize proprietary procedural markup, SGML provides a descriptive markup capability that is platform and product independent, enabling a large variety of tools from different vendors for information creation and manipulation.

2.4.1 Procedural Markup

Since procedural markup is typically controlled by a single company or consortium, it is invariably both proprietary to that vendor or consortium and optimized for specialized processes to be supported by that markup. The proprietary nature results from each vendor devising a specific set of application functions. Even where the functionality may be identical, each vendor's product will have unique codes and methods of accomplishing the function, based on individual convictions as to the best way to identify and execute a particular function. As the product evolves, new markup will need to be specified for each new feature or process and the diversity between the vendor-defined process will only increase.

Procedural markup does not, typically, address or attempt to eliminate platform dependencies such as the machine character set. For example, the encoding of accented letters will vary among systems and a file created on an Apple[ Macintosh or UNIX[ system will not be directly processable by an IBM-PC[, due to character set incompatibilities.

Figure 1 provides an example of the kind of procedural markup that may be embedded in an office memorandum. Note how the markup within the document, as defined by the vendor, is oriented towards the presentation or the processing (hence "procedural") of the information content contained between the markup.

[Centre] MEMORANDUM [Hrt][Hrt]

To:[Tab][Tab][Bold]John Smith[bold][Hrt][Hrt]

From:[Tab][Bold]NoŽl Renť[bold][Hrt][Hrt]

Subject:[Tab][Bold][UND]Future Memoranda Use[und][bold]

[Hrt][Hrt]This is a note regarding the future creation of

memoranda.[Hrt][Hrt]Please ensure that memos are concise.

Figure 1 - An Example of WordPerfect® Reveal Codes

Problems will arise when an organization's information is encoded using procedural markup. If it becomes necessary to change vendors or to process legacy data with different applications, one can not be assured that vendor-specific markup will necessarily be correctly interpreted by the new or different product. Similarly, if a revised process must be used with legacy data created for one particular purpose, one can not be assured that the process-specific markup will necessarily be correctly interpreted.

2.4.2 Descriptive Markup

Since descriptive markup is standardized and content based, it is independent of the processes that will be applied to marked up data. When a standard descriptive markup is utilized, dependence on a particular vendor with respect to data format is removed. Other external factors (e.g.: platform choice, feature selection, etc.) may, of course, tie a user to a particular vendor, but the data will conform to an accepted standard markup.

Descriptive markup allows information to be identified by its purpose, thereby making the information accessible and available for processing as intended by the information creator. Document components that are accessible can be re-used by multiple process and applications without invoking data conversion routines.

Processes may differ due to user preference (e.g.: one user desires lists numbered with digits while another prefers roman numerals), due to medium (e.g.: paper printing, interactive screen delivery, or CD-ROM distribution), due to purpose (e.g.: database loading or report printing), or due to some other characteristic.

Figure 2 illustrate how an office memorandum could be markup up descriptively. Note how the markup identifies the semantics of the content that is contained between each set of markup codes. Note also that the system dependencies are removed. For example, character set encoding conflicts may be eliminated through descriptive markup (i.e. the substitution of character entities representing non 7 bit ASCII data). In the example, the accented character "e" is represented as "é" in standard SGML coding. It is the responsibility of the processing system to interpret such substitutions into the forms required by the local system.



<to>John Smith</to>

<from>No&euml;l Ren&eacute;</from>

<subject>Future Memoranda Use</subject>



<para>This is a note regarding the future creation of


<para>Please ensure that memos are concise.




Figure 2 - A Descriptively Marked Up Memorandum Document Instance

The information contained within a document is defined in a top down manner using the descriptive markup. The resulting definition portrays the structure of a document as a hierarchy of document components. For example, the first set of tags (i.e. <memo> and </memo>) apply to the entire document. They are followed by tags for the next level of components (i.e. <header> and <body> which are in turn subdivided into their constituent information items. This process of specifying successive components is applied during the analysis phase for each type of document. Figure 3 provides a graphical representation of the information structure that is applicable to a descriptively marked up sample memorandum.

Figure 3 - The Hierarchy of a Memorandum Document Type Definition

2.4.3 SGML Overview

SGML, standardized in 1986 by the International Organization for Standardization as ISO 8879, is a standard means for structuring and identifying the information contained in electronic documents. Based on government-wide consensus, the Treasury Board Secretariat has endorsed its use within government documents (3) .

Automated formatting was originally performed by using specific coding that mimicked typesetting commands (e.g.: "space 3 lines", "use Times font", etc.). In the late 1960's, generic coding was developed for the publishing industry by IBM's GENCODE project. This project created a suite of generic tags for use in documents (e.g.: "heading", "paragraph", etc.). In 1982 the ISO sub-committee responsible for Document Processing and Related Communication began to develop the specification for SGML in order to overcome the limitations of GENCODE (e.g.: restricted tag set, etc.). By abandoning the suite of generic tags established by GENCODE, SGML allows users to define customized tagging structures.

Two language specifications make up the standard, one to specify the markup language for a class (or type) of documents, and another to markup instances of a class of documents. The document class is described by the Document Type Definition (DTD). Individual content files that follow a Document Type are called document instances.

HyTime, standardized in 1993 as ISO 10744, is the Hypermedia and Time-Based Structuring Language, and is based on the SGML language. HyTime is used to express certain semantic relationships between information components. Of interest to forms are constructs related to lexical specification (i.e. the ability to specify specific content values or ranges for a given information item) and location addressing (i.e. the ability of indicate a specific location within a document).

(3) Treasury Board Information Technology Standard Number 14. Standard Generalized Markup Language.

3.0 Traditional vs. Structured Approaches

To enhance understanding of the potential of an SGML context for forms, it is useful to compare and contrast the traditional approach versus a structured approach to documents and forms automation. This comparison helps to isolate the areas of electronic processing that are amenable to standardization.

3.1 Traditional Approaches

3.1.1 Document Systems

In the traditional approach to document systems, vendors have over many years derived many customized formats for the storage of information that met specific processing concerns. These concerns include performance, ease of manipulation, or type of processing being performed (e.g. sorting, display styles, etc.). For example, a CD-ROM product vendor would choose a format that was optimal for this physical media and or the corresponding retrieval engine and associated user interface.

Similarly within the word processing and desk top publishing sector, commercial systems utilize customized formats that are geared to effective formatting and laying out information. Each vendor has typically chosen a format of their own that combines content and layout in an efficient way. Documents conforming to these formats are procedurally marked up and are typically flat and unstructured. The gradual use of styles may convey some notion of semantic meaning to documents, - styles are not hierarchical (one cannot typically have styles within styles) and styles are very often merely used to capture packages of formatting instructions.

As illustrated in Figure 4, each use or process applied to data is very often tied to the data formats optimized for that process. Each vendor that adopts a process-biased data format, in turn, provides a unique document creation environment to the user.

Figure 4 - The Traditional Approach to Documents

An example of this type of proprietary coding (or procedural markup) is provided by the WordPerfect[ codes which can be displayed by invoking the "Reveal Codes" function. This markup is geared uniquely to the processes specified by this software developer.

While commercial software is available to interpret the customized codes embedded in other packages, the data conversion is never guaranteed and is rarely totally successful. Typically the source information comprises indivisible format and content data and the conversion software lacks the ability to isolate and separately exchange any one of the two components.

The successful interchange of content and format between different hardware platforms or vendors' packages relies on the ability of vendors to successfully interpret the procedural markup defined by others. At times, platform dependencies are built into the forms, most notably with regard to character set encoding. An example of incompatibility is the coding of an accented letter on an IBM PC platform. This differs from that on either a Macintosh or UNIX platform. Successful translation must accommodate platform as well as vendor created data incompatibilities.

The benefits of the traditional approach to document systems are rooted in the dependencies of vendor-specific procedural markup. The packages that support this approach can make the focused task of the user very simple to execute and quick to perform. As a result, the vendor can move their package into the mainstream and support the resulting large installation base well.

The drawbacks of these systems are also rooted in the application dependencies in that the markup is too focused to the task at hand to be flexible for other purposes. The content and procedural information are too tightly combined to be separated easily. The "shelf life" of the information may be directly related to the vendor remaining in business or even the length of time that a specific version of software remains supported by the vendor. Vendor demise or extensive version changes can result in costly product upgrades or data conversions.

3.1.2 Forms Systems

Traditional forms systems were created to fulfill the urgent need for organizations to migrate their paper environments to newly acquired electronic systems. The underlying designs reflect the basic needs of forms users to replicate and automate the paper form contents, but they do not meet an important (and ever growing) need for data interchange between communities of forms users.

The contents defined and contained within a organization's collection of paper forms is typically defined and managed within a traditional forms application by using a database system. Flexible forms packages allow the user to select a database system from many vendors. The form filling component provides the user with the means for creating content and presenting that content in a paradigm familiar to the user of paper forms. Figure 5 depicts two communities of users and the form filler programs supporting two traditional forms environments.

Figure 5 - A Traditional Approach to Forms

Each community of users must use a single vendor's database, shown as DB Type 1 and DB Type 2, as the central store of content information due to the proprietary index and storage formats that are used with these databases. Any interchange of information between these communities of users involves the interchange of database records between potentially different vendors' database packages. Such interchange is syntactic in nature (i.e. record based file formats) and not semantic in nature (i.e. the meaning of the information represented in fields is explicitly described). Therefore, the internal database record format of one community must be identical to that used by another community to achieve interchange without translation. When such translation is required, the semantics of the information is not inherent in the data and translation can not be automated and must be addressed by direct intervention.

To accommodate user specific access to form contents, forms systems can separate layout from content. The form layouts can be customized to meet the needs of different users in creating, editing or viewing forms contents. These layout specifications are vendor specific in much the same way that proprietary markup is used in traditional document systems. Each vendor utilizes a format best suited to their view of processing and their software packages. Consequently, there is no guarantee that the layout of forms can be successfully interchanged between packages created by different vendors. Typically one community of users using one vendor's software can not share form layout information with a community of users using a different forms package or hardware platform.

The benefits of traditional forms systems centre around the ability for a community of forms users to successfully accomplish the task at hand. The homogeneity of the environment within the community ensures successful creation, use and interchange of forms data within that community.

Drawbacks are coming to light as users recognize the need to interchange the contents of the forms and to share the layout of forms between communities of users. Differing database implementations of form content, differing community formats of database records, and differing vendor implementations of form layout present obstacles to successful interchange of this information. It is becoming increasingly evident that vendor and system dependencies can require total redesign of content and form layout information.

3.2 Structured Approaches

3.2.1 Document Systems

Structured document systems divorce document content from the specifications of possible processes that could be applied to that content, as shown in Figure 6. By separating the processes, different vendors or even different users can independently specify processes that could act on data created by others.

Figure 6 - The Structured Approach to Documents

A structured document system uses the descriptive markup to determine which data should be presented to a process. Descriptively marked up data includes a specification for the structure of the document (i.e.: the model) as well as the data within that document (i.e.: the content) that conforms to the structure specification.

The model for a document is expressed in a markup grammar. When that expression mechanism is standardized, vendor applications that endorse the standard are obliged to conform to and understand this markup in a consistent and predictable fashion. Standards compliant software may run on different platforms and free data from platform dependencies (e.g.: character sets). By adopting a common denominator for data representation, a standard syntax allows platform dependencies to be accommodated (e.g. accented characters are encoded independent of the character set implementation used on specific platforms).

Users of standardized markup are free to choose software products from different vendors for creation and subsequent processing of data. Communities of users that choose to share specific document models can realize added benefits by being able to interchange documents directly between their respective systems. Legacy documents will remain usable as the standard generalized markup will ensure that new and enhanced systems will process the old conformant data.

As users' requirements evolve, it may become necessary to change vendors or hardware platforms. User information investments will be safeguarded by the standard syntaxes and they will be able to take advantage of enhanced suites of features and user interfaces built upon commonly used technology.

The specifications of processes that act on structured data may or may not themselves be standardized. Users can accelerate this process and realize added benefits by encouraging vendors to adopt standardized processing specifications such as the recently finalized layout specification - the ISO standard for Document Style, Semantics Specification Language (DSSSL).

The benefits of structured document systems are closely linked to the flexibility provided by the standardized expression of document structure and document content. This flexibility protects the users investment in their information resources while giving them a choice of vendors products and hardware platforms.

The drawbacks that exist today for these systems center around the infancy of the technology. Higher "front-end" costs are usually borne by earlier adopters of international standards. Since the standard syntaxes and associated processes are necessarily complex, it may still take a little time before user interfaces and products are available that have been designed to hide this complexity. In the meantime, user training and consulting services may be required to achieve successful implementation. While an innovative vendor base for these technologies is growing, it is still true that the mainstream vendor of proprietary document management have been slow in adopting structured document standards. As a result, implementation costs are higher than that those associated with popular word processing or desktop publishing systems. These disadvantages will diminish as the markets grow and the technologies mature.

3.2.2 Forms Systems

Figure 7 depicts a possible approach to the implementation of a structured forms system. A form is a particular kind of document used within an organization and can be viewed as sharing many of the attributes of other company documents. Thus the benefits found in structured document systems that are attributable to the separation of structure, content, and the processes that act on that content, can also be realized in structured forms systems.

Figure 7 - A Structured Approach to Forms

In order to realize these benefits, structured forms systems must use a standard syntax to create, display and interchange the forms information as indicated in the following paragraphs.

The structure of forms information will have to be expressed in the same way as the content structure for documents. These specifications will enable users to capture the content through a process which is analogous to those currently followed in, "filling-out" proprietary electronic forms. However, they will be able to do so using software from different vendors running on various platforms.

A critical success factor for the widespread use of electronic forms is the appearance or layout of information within a form. Successful implementations of electronic forms must be able to replace the use of paper forms without unduly compromising established forms processing and creation practices. The ability for users or entire communities to customize the layout design is a key capability for an electronic system. The necessity to customize the layout, in turn, brings about a need to be able to interchange layout designs or specifications within and across user communities.

The many processes to be executed on forms content can also be specified to be independent of vendors or systems. Form editing, printing, and database loading are examples of different processes that act on forms content, which could potentially be supplied by different vendors and which could run on different hardware platforms.

To ensure interchangeability of layout design and process specification, common mechanisms will need to be employed. The figure depicts this concept by using a common layout model between the two layout descriptions used by two different communities. Since standards for layout or process specifications are lacking for forms users and systems, there is an opportunity to contribute to such a standardization effort.

The benefits of a structured forms environment are analogous to all the benefits of structured documents and should exceed the paybacks of traditional forms systems due to the flexibility provided by the standards approach and the reusability of the forms information. Central databases of structured information can be created, forms content can be easily interchanged within a community and between communities, printing can be made flexible, and layout specifications can be interchanged both within and between communities. Other benefits of descriptive markup approach to electronic forms will likely become apparent as implementations become more advanced.

The drawbacks of this approach also parallel the very drawbacks of structured documents implementation due to the infancy of the technology. Users and vendors may need to collaborate to ensure timely and effective implementation of standards-based technology in electronic forms applications.

4.0 Methodology

Application design methodologies (e.g.: Systems Development Life Cycle or SDLC) have been used to ensure that work is focused and that essential steps are executed in realizing the application objectives. Historically, methodologies of this type have been used for many systems, including forms applications.

It has been recognized that traditional systems address as little as 10% of an organization's total information holdings - that 10% being the very structured or tabular (i.e.: database) information. Much of the remainder is buried in documents, and is difficult to access using traditional systems approaches. Within the past decade, new efforts have been initiated towards making use of information that is buried in documents.

A Document Development Life Cycle exemplifies these new initiatives and provides a useful methodology for analyzing, identifying and organizing the information contained in documents.

4.1 The Document Development Life Cycle (DDLC)

The Document Development Life Cycle (DDLC) addresses documents, and the information contained in them, from conception through to managing and archiving. Different steps within the DDLC differentiate the definition, creation and utilization of documents.

One perspective of the DDLC is as a framework composed of six phases: planning, analysis, design, creation, distribution, and management. Such a framework is depicted in the Computer Aided Document Engineering (CADE[) document management framework.

The following describes the nature of the steps of the DDLC in the context of an organization's documents. For each phase the first paragraph explains the essential aspects of the DDLC methodology while the second indicates the type of tools that are available to perform the essential tasks.


Planning ensures that document projects are built on a firm foundation. It encompasses the scope, objectives, resources, timelines, budgets, responsibilities, and deliverables of the project. As well, Planning includes decision-making about business re-engineering opportunities, business-case development, pilot projects, tool selection, and critical success factors.

The output of Planning is a set of requirements, goals, budgets, and tasks. The tools available for the Planning phase include groupware and office-automation products (like word processors and spreadsheet programs).


In Analysis the business requirements discovered by Planning are used to identify required information objects. Objects can be collected and categorized, using a collaborative, consensus-building approach to provide an electronic central repository of objects and decisions that is used as a "data dictionary" for enterprise-wide use and for later DDLC phases.

This approach allows subject-matter experts to participate in the analysis without requiring them to know such underlying technologies as SGML. Analysis can use groupware software tools to facilitate decision making without relying on face-to-face meetings.

Document Design

In Document Design, objects identified in Analysis are arranged into document models - summaries of document structure. Alternative models are considered and the design is finalized.

Tools used include the repository established in the Analysis phase and an interface which allows maintenance of ASCII files with structure described by SGML document type definitions.

Instance Creation

Instance Creation uses models created in Document Design to guide the creation of new document instances, or to convert existing information into document instances.

Structured editors and legacy-document conversion products are commercially available from various vendors and could be used effectively and efficiently within this phase. The outputs of this phase are structured documents, such as SGML document instances.

Instance Distribution

The Instance Distribution phase addresses the storage, retrieval and publishing of structured information. Structured documents can be used to develop a single-source repository from which multiple forms of output may be generated.

This phase uses tools such as printing systems, electronic viewers, and CD-ROM publishers. Central to the Distribution phase are information repositories and text retrieval facilities.


Management completes the knowledge accumulation and dissemination process that is embodied in the DDLC. Examples of management activities are the determination of which types of documents will be created and which purchased, who will be responsible for acquiring and managing the documents, and which documents shall be discarded and which archived.

Tools for Management control information access, workflow, and archiving. Carefully managed repositories ensure that an organization gains knowledge to continuously feed the DDLC cycle. The iterative nature of the DDLC results in an ongoing process of improvement.

4.2 As a Template for the Form Development Life Cycle (FDLC)

A form is a type of document. The differences between forms and other types of document are basically (1) forms generally include a very well defined delineation of information content and (2) forms involve a very specific presentation of information.

Because a form is a type of document, the Document Development Life Cycle is applicable to the development and use of forms as well, with minor adjustments, to reflect the more specialized forms environment.

Therefore, in the context of forms, the Form Development Life Cycle (FDLC) would be comprised of the phases: Form Planning, Form Analysis, Form Design, Form Instance Creation, Form Instance Distribution, and Forms Management.

This brings to the forms world the advantages that the DDLC brings to the document world. For example, the Form Analysis phase results in a shared element repository of information objects related to forms. Thus, in the design of any new forms, these objects could be re-used, with a corresponding reduction in form creation effort and cost.

5.0 The Environment Required to Support SGML

There are a number of components that must be in place in order to support efficient and effective use of SGML-based standards.

5.1 Departmental Infrastructure

The departmental infrastructure will be largely determined by management's appreciation for policies regarding the management of government information holdings and the opportunities that evolving document management technologies and standards provide for enhancing document management services using available resources.

Infrastructure is the underlying foundation which supports the daily activities of an organization or system. It includes:

- the organization and the way it is structured;

- the functions, processes and procedures through which work is performed;

- the people and the way they are assigned to roles;

- the technical environment and;

- organizational standards relating to these items.

It is recognized that the on-going government-wide efforts to restructure government and to re-engineer service delivery while maintaining a technologically adept civil service will have a major unspecified impact on the first three elements of infrastructure. However, these are outside the focus of this analysis.

Examples of components of the technical environment include building and workstation layout, communications tools (telephone, FAX, computer network hardware and software), and central computing hardware and software.

Those components that are germane to the use of SGML in electronic forms are related primarily to the software tools selection or development for the FDLC (all other issues are orthogonal). The platform independence and data transparency of SGML allow forms encoded using SGML to be communicated easily within existing networks and communications infrastructure. System resources such as information repositories or database lookup tables may also come to bear on the FDLC, perhaps by facilitating document creation, editing and content validation.

5.2 Hardware and Software

The environment required to support SGML includes the hardware and software components that interface with the user in performing activities associated with application systems concerned with the management and use of forms throughout the FDLC.

SGML enables information to be represented in a platform and vendor independent fashion, thereby ensuring long term utility and transportability across environments.

It should be noted, however, that the product feature differentiation (e.g.: ease of use, hardware platforms supported, etc.) available from vendors is an important consideration when selecting SGML-enabled software to utilize within a particular environment. Not all products that support SGML are necessarily applicable or desirable in automated forms. SGML is not a panacea for electronic forms, but only a technology that offers certain critical benefits for an electronic forms environment. How vendor products or custom solutions deliver those benefits will impact on the software selection.

5.3 Information Repositories

Information repositories can provide an environment for supporting SGML throughout the FDLC, including activities such as form definition, form filling, and form storage, as well as resolving the issues of version control and security.

These repositories could also hold, maintain and disseminate the following types of information in a shared environment at a departmental or government-wide level:

- document type definition components or fragments;

- layout and presentation descriptions;

- departmental standard information sets (valid field values, look-up tables, etc.) and

- form instances.

An information repository may be maintained centrally or in a distributed fashion.

The Treasury Board Secretariat report Requirements Definition for Structured Document Registry and Repository addresses some of the issues related to the implementation of libraries of structured documentation, especially SGML documents, and the provision of open access to the libraries for the purpose of electronic dissemination.

6.0 Investigation Results

This section summarizes the results of this investigation.

6.1 The Form - Training Application and Authorization

The Government of Canada form for Training Application and Authorization was selected as an example of current form usage and procedures within the Canadian Federal Government. As noted by the Project Advisory Group, this example is not a definitive statement of all forms handling in government departments, but merely a representative example of the kind of information that is maintained and processed by forms applications.

A training application and authorization form must be completed whenever an employee is provided with formal training totaling one day or more. Various individuals, including those that take, administer and, that provide the training are required to supply the information identified in the form. These individuals include, training service suppliers such as the Public Service Commission (PSC), and the various staff in government departments that are part of the staff training and development functions within these departments.

Copies of this training form are appended as Attachments A & B to this report.

	-	Attachment A consists of a reproduction of a blank 
		Training Application and Authorization plus the corresponding 
		instruction sheets

- Attachment B provides a sample of a completed form to illustrate the kind of information that would normally be supplied by a user.

The structure, contents and use of the Training Application and Authorization Form was analyzed and the results are summarized in Appendix A.

This form is used to record information concerning the identity of a training applicant, the course for which the application was made, and the authorizations that are required to administer the training process. This information is recorded in blocks on the form in sequence. In SGML terminology this sequence represents the document's structure and the individual items of information within these blocks are considered the document content. The following information blocks are presented on the form:

	(a) Unique identifier and status of the training application 
	    (unnumbered fields and field 1);
	(b) Identification of the applicant, including supervisor's 
	    authorization (fields 2 to 16);
	(c) Description of the training course  (fields 17 to 28);
	(d) Financial information, including the cost(s), and associated 
	    expenditure authorizations for the training course (fields 29 to 32);
	(e) Training control information (fields 33, 34 and fields on 
	    the back of the form) that is provided by 
	    departmental staff responsible for training program administration;
	(f) Applicant's evaluation of the training (unnumbered field on 
	    the back of the form), and;
	(g) Departmental use codes to be devised and applied as 
	    necessary within individual departments. 

6.2 Form Handling Activities

The "Departmental Use Codes" which appear at the base of the Training Application and Authorization form were devised to accommodate department-specific forms handling needs. Factors which could influence the way a form is handled include: the nature of the training; the size of the department; desired degree of control; etc. These factors were accommodated in the demonstration project by examining the specific activities which must occur; the information which must be provided to complete that activity and the next step in the forms handling process that had to be executed.

These activities could be categorized as determining who has to provide what information in order to advance the process to the next step. These represent a typical set of forms handling activities that can be generalized to include any kind of form. To represent these activities within an SIGMA-based electronic forms environment this demonstration was organized according to the Forms Development Life Cycle methodology (as derived from the Document Development Life Cycle described earlier in this report). The use of this methodology is illustrated in the following paragraphs.

6.2.1 Forms Planning

The planning undertaken for implementing the demonstration environment involved an orderly execution of the following tasks:

The end result of this planning process was a defined demonstration environment which could support a realistic forms handling scenario (i.e. one which incorporated a number of desktop platforms and commercial SGML products that could capture the required data and transfer it between the respective systems connected by local and wide area networks). This planning phase is critical to all successful implementations of the full life cycle for electronic documents and forms.

6.2.2 Forms Analysis

The forms analysis phase was devoted to identifying the kinds of information (i.e. information objects) that appeared on the form and documenting it in a central repository titled "TBS02 - Information Objects". This step involved the careful examination of the form and associated completion instructions. Each information object was given a name, a definition and any constraints concerning the maximum number of characters or range of valid values were noted.

Based on the inherent capabilities of the analysis tools selected for this activity, the resulting information objects could be easily accessed by subject experts and revised as necessary. The analysis methodology, enforced by the selected tools, should normally result in a set of information objects that are simpler to understand and manage than their counterparts in most existing paper and automated forms environments.

The end product of the analysis process was a central repository, which was implemented as a database of object declarations and model descriptions. This database could be used to support the development of other forms and to promote the use of the standardized names, values and definition for information objects that are common to various forms.

6.2.3 Forms Design

The design phase focused on producing an SGML-compliant specification of the corresponding paper form. Using a graphically-oriented design tool, information objects, identified in the analysis phase, were rearranged into logical groups (see Attachment C) while safeguarding the basic structure of the original paper form identified in section 6.1 above. Similarly, the content in the redesigned training form corresponds closely to the content of the paper form but is controlled by the definitions and values established in the analysis phase.

Ultimately, the seven blocks on the original form were reduced to the following five in the electronic version:

The end product of the design process was an SGML Document Type Definition (DTD) shown in Attachment D. It was generated automatically, utilizing the information objects from the central repository, and demonstrates that forms users are no longer required to understand the underlying SGML syntax to participate in electronic forms design and standardization. However, some familiarity with data modeling is certainly an asset.

6.2.4 Forms Instance Creation

To support forms information capture (i.e. instance creation), the DTD prepared in the design phase was installed on all the hardware platforms for the demonstration project and prepared for use by the commercial SGML software packages. For each vendor's product, an editing environment was created within which the operator could create or manipulate form contents. Since the available packages are not oriented to support forms filling, the respective user interfaces present different approaches for user data entry and content tagging. These custom approaches are reflected in the screen and report layout capabilities that are supported by the respective SGML packages.

Once each editing environment was created, the respective SGML software packages could be used to create, to display and to modify form contents.

The result of this process was a demonstration site that could be used to capture, display, modify and store forms-based information and to encode this information as fully-compliant SGML documents. In a normal situation the instance creation process is carried out by each of multiple participants.

6.2.5 Forms Instance Distribution

The demonstration environment was configured to illustrate production of both electronic and paper versions of the form content. As noted earlier, existing SGML software packages are not adapted to generating the stylized display that is common to most forms (i.e. numerous boxes of various size and shape with bilingual text in each box to identify the contents of that box).

To accommodate the distribution requirements, the demonstration environment was configured to store SGML-encoded forms and associated contents as data files and to exchange them among the forms users using electronic mail applications that communicated over local and wide area networks.

The results of this process was an multi-vendor software and hardware facility that could successfully interchange SGML-based forms and also produce physical copies of the forms on paper.

6.2.6 Forms Management

The concluding phase of the forms development life cycle strategy should make it apparent that management has been empowered to control access, use and archiving of forms information by applying this methodology. As such the management phase must rely on the preceding phases to demonstrate the end result.

This methodology was used in this project to illustrate that the strategy was useful in planning and controlling the implementation of the demonstration environment as well as to support creation, distribution and archiving of copies of the forms contents. The result of this process is the successful operation of the entire demonstration environment.

6.3 The Feasibility Demonstration

This section identifies various characteristics associated with the demonstration environment and highlights some of the constraints and benefits which were demonstrated. It is presented as a series of steps, in the following paragraphs.

Figure 8 is a graphical summary of the steps that were followed in establishing the demonstration environment and in processing a sample training form through an entire life cycle. These steps are also summarized in Appendix B - Summary of Feasibility Demonstration. This summary names each step, identifies the type of participant that performed the task, lists the tools chosen for each step and gives a reference to attachments displaying the information relating to the step (for example, the information contained on the form following completion of the step is presented, both in the tool's layout format and through a listing of the SGML instance as an ASCII file).

Figure 8 - Overview of Feasibility Demonstration

The demonstration illustrates those benefits brought to an environment through the use of SGML. Off-the-shelf software from different vendors, executing on different platforms, utilizing different communication networks (LANs and WANs) was used in the demonstration.

6.3.1 Planning the Demonstration

In order to demonstrate that SGML-encoded forms offered users the opportunity to work in a hardware and software environment that was vendor independent, the systems and communications facilities had to be provided by various vendors, operate on various hardware platforms and communicate over local and wide area networks. Appendix B identifies the facilities used to fulfill this constraint.

To illustrate how the workflow associated with the training application and authorization form could be managed, the demonstration defined a scenario with associated roles to be performed by the following four types of people or groups:

	-  a Team, consisting of systems, SGML and subject matter 
	   (i.e. training) experts, to develop the forms application;
	-  Al Applicant, an applicant for training;
	-  Sue Supervisor, who is Al's boss and
	-  Chris Coordinator, who manages the organization's training program.

The team defines the environment in which the workflow takes place. In summary, the workflow consists of the applicant filling out a form and forwarding it to the supervisor, the supervisor authorizing it and forwarding it to a submittal area for training applications, the training coordinator approving it and returning it to the applicant for final evaluation and archiving.

The demonstration did not employ any kind of mechanisms to validate digital signatures. Adequate authorization was deemed to be provided simply by typing a name within the appropriate signature field.

To support exchange of the training information between the three role players, four fictitious accounts were established to identify various e-mail addresses. Al Applicant was assigned the user ID "AAPPLIC" and Sue Supervisor the user ID "SSUPER" for use in the Windows environment. Chris Coordinator was given an account in UNIX and the UNIX account "TASubmit" (for Training Application Submission) was created as a pseudo-address. This communications infrastructure required the applicants to be informed of the address to which training applications were to be forwarded via e-mail. An automated e-mail handler accepted the mail message and accompanying, properly completed, training form and deposited the application in a directory common to those responsible for training. Chris Coordinator was given access to this directory.

6.3.2 Analyzing the Form's Information Content

This step involved the perusal of the form and its completion instructions, the identification of information objects and the categorization of them into the types of objects in a document modeling environment. The tool employed allows authorized subject matter experts access to work in a distributed work group setting and to contribute and comment on the set of information objects. Tools such as this could be used to support collaboration by systems, SGML and forms experts on a government-wide basis.

6.3.3 Designing the Form

A model of the Training Application and Authorization form was developed, based on the analysis of the form. The design tool, used in the project, allows subject specialist rather than SGML experts to prepare a graphical specification of the form to automatically generate the equivalent SGML syntax conforming to the international standard.

The model is displayed in Attachments C and D (the graphical display in Attachment C corresponds exactly to the SGML coding in Attachment D), SGML elements and named groups are listed in Attachment E and SGML attributes are listed in Attachment F.

6.3.4 Form Instance Creation

This step supports the creation and revision of form contents. An input layout was devised to guide the applicant entering all the required information about the applicant, the course, the estimated costs, the objective of the training, and the signature.

Al Applicant performed this step using Microsoft Word on a Windows platform. The layout of the form is presented in Attachments G (blank) and H (completed) and the resulting SGML instance is listed in Attachment I.

6.3.5 Instance Interchange through a Network

When Al Applicant forwarded the form to his supervisor, Sue Supervisor, only the content of the form, an SGML document instance conforming to the model of form content, was transferred.

This transmission took place over the local area network which was used by both Al Applicant and Sue Supervisor. The form instance, created by Al, was directly accessed by Sue.

6.3.6 Instance Editing in a Windows Environment

When Sue Supervisor accessed the form and authorized the training by providing a sign-off in the objective field, a different input layout was presented to her, to illustrate that a customized view could be provided at each stage of form processing, if necessary.

This is an editing step which was performed using InContext, a different brand of software than that used for creating the initial instance. This illustrates the vendor independence of SGML-based forms applications.

A screen copy of part of the input layout of the form is presented in Attachment J, Attachment K is a report of the information from InContext, and the SGML instance resulting from the editing is listed in Attachment L.

6.3.7 Instance Interchange from Windows to UNIX through Electronic Mail

In this step, the form was forwarded to the training application submission pseudo-account (TASubmit), where it was read by Chris Coordinator. This transmission was performed over the Internet to illustrate the use of electronic mail systems. Identical facilities could be provided by other electronic mail systems such as X.400 or custom systems.

6.3.8 Instance Editing in a UNIX Environment

In this step, Chris Coordinator added some remarks and authorized the training request by adding his signature (i.e. input his name in the appropriate space).

This editing step was performed on a UNIX platform (a different hardware platform) using ADEPT Publisher (yet another different brand of software). This desktop facility was chosen to illustrate both the platform independence and the vendor independence provided by the use of SGML-based technology.

Attachment M is a view of the editing environment provided by ADEPT Publisher. Attachment N is a report of the information from ADEPT, and the SGML instance resulting from the editing is listed in Attachment O.

6.3.9 Instance Interchange from UNIX to Windows

The form was returned to Al Applicant and the transmission took place over the Internet once again.

6.3.10 Instance Editing in a Macintosh Environment

In this step, Al Applicant was depicted to be working on a home computer, a Macintosh, with dial-up communications facilities which enabled Al to access the training application pseudo-account (TASubmit) to retrieve the information and to load it into another SGML editing package, Author/Editor.

This is an editing step in which the applicant enters an evaluation of the course. It illustrates that the data, having been modified on other platforms and by different software packages, can still be transferred and modified by yet another package running a completely different hardware and system environment.

6.3.11 Instance Archiving

For purposes of the demonstration, this step was illustrated by printing a copy of the completed form. This copy can then be stored using existing paper storage methods and facilities.

6.4 Findings of the Feasibility Experimentation

The findings of the experimentation are categorized following the Forms Development Life Cycle, with emphasis on those issues that are related to SGML and its applicability to a forms environment.

6.4.1 Forms Planning

All expected aspects of planning were successfully executed in this demonstration.

6.4.2 Forms Analysis

The database of information objects was successfully created. In a forms environment fully compliant to HyTime constructs, the set of information objects would be more complete and the objects themselves could take advantage of the more powerful validation facilities to ensure that only valid values appear in specific parts of a form.

6.4.3 Forms Design

It was observed that no changes were required to the syntax of the DTD in order to be successfully installed in each environment.

Each vendor package, however, handled the DTD in a different fashion. One package allowed the DTD to be wholly contained within the prologue of the document instance, while the others required the DTD to be a separately processed, prepared and compiled entity within the system before being able to manipulate instances of the model.

In an open electronic forms environment, where users of the form models will not have the form's DTD before needing to create or modify instances of the form, it will be crucial to have the form model travel with the document itself. Two mechanisms that allow this to happen are the inclusion of the DTD declarations within the prologue of the instance, and the packaging of the DTD as a separate file with the instance. The SGML Document Interchange Format (SDIF), ISO-9069, could potentially be used to support the latter option.

6.4.4 Forms Instance Creation

The input layouts of the user environments were not portable between packages. Accommodating such portability between environments has been addressed in an earlier section of this report.

It should also be noted that one of the editing environments successfully presented the content fields in a manner approaching the appearance of an actual printed version of the Government of Canada form. Further, it was established that the nature of the processes required for a forms environment is different from that represented in a typical SGML environment. While a typical document environment presents the SGML concepts of optionally, repeatability and alternation (choice) following sequential document writing paradigms (e.g. which of the following valid editing elements do you wish to edit at this point in the document), context is more explicitly controlled by the user in a forms environment (e.g.: point and click the mouse on a particular form field). In addition, in a forms environment there is opportunity to present the concepts of optionality, repeatability, and alternation using user interface dialogue constructs familiar to users of GUI interfaces (e.g.: radio buttons, check boxes, etc.).

6.4.5 Forms Instance Distribution

To date, output layouts of the user environments are not portable between packages. One of the packages supports a military standard for presentation specifications (FOSI), while two others utilize their own proprietary mechanisms. The standard ISO-10719 Document Style and Semantics Specification Language (DSSSL) is a candidate for future portable presentation specifications.

6.4.6 Forms Management

All expected aspects of workflow management were successfully executed in the experiment.

6.5 Experimental Results Regarding the use of HyTime

Without the support of HyTime constructs in any of the vendors' packages available for use in the demonstration environment, it was not possible to experiment with the constructs. The following concepts, however, could be addressed in a future examination of electronic forms.

6.5.1 Lexical Analysis

Lexical analysis is the validation of content based upon models of the makeup of that content. For example, a lexical model named "GoClex" (for Government of Canada Lexical Models) can be created using HyTime syntax to define lexical types:

<!NOTATION HyLex PUBLIC "+//ISO/IEC 10744:1993//NOTATION HyTime Lexical Model Notation//EN">


<!ATTLIST GoClex HyTime NAME #FIXED lexmodel


notation NAME #FIXED HyLex

tokens (tokens|notokens) notokens>

Using the above element declaration, the following two lexical types "pri" (for a 9 digit Personal Record Identifier) and "postcode" (for Canadian-styled postal codes) can be created:

<GoClex ltn=d>([0-9])</GoClex>

<GoClex ltn=a>([a-z]|[A-Z])</GoClex>

<GoClex ltn=an>([0-9]|[a-z]|[A-Z])</GoClex>

<GoClex ltn=pri>(d,d,d,d,d,d,d,d,d)</GoClex>

<GoClex ltn=postcode>(a,d,a,d,a,d)</GoClex>

Using these lexical constructs it is possible to provide the following specification for a postal code element:

<!ELEMENT POSTCODE	(#PCDATA) --<title>Postal Code-->

<!ATTLIST POSTCODE --<title>Postal Code Attributes--

lextype CDATA #FIXED "#CONTENT postcode"

lexmodel NAMES #FIXED GoClex>

The use of this form of lexical modeling requires a management infrastructure supporting repository facilities which could facilitate and encourage easy access and use of the associated model. The lexical model specification syntax is an instance of a simple DTD fragment and, therefore, cannot be specified wholly within the DTD itself. To allow for simple document instantiation (the creation of a document from scratch), one cannot expect the user to have to include the lexical model within a new instance.

There is no requirement for the lexical model instance to be part of the user document instance, therefore, the lexical information can be kept elsewhere in the "system".

6.5.2 Component Dependencies

It is possible using HyTime to specify the dependencies that exist between components. These dependencies can reflect semantic information between different components of a form. The many kinds of dependencies can be specified utilizing the standard HyTime mechanisms for reference type name space limiting and location model specification.

An example of two objects requiring the specification of a dependence follows:

Objects GROUPB (Group B) and CMPB1 (Component B1) are mandatory and objects GROUPA and GROUPC are conditional; however, the existence of GROUPC requires the existence of GROUPA (though GROUPA does not require the existence of GROUPC). Objects CMPA1, CMPA3, CMPC1 and CMPC3 are all mandatory and objects CMPA2 and CMPC2 are conditional; however, the existence of CMPC2 requires the existence of CMPA2. These examples show how dependence can be specified for any level of granularity.

Note, in the following SGML syntax, how the attributes of GROUPC and CMPC2 point to GROUPA and CMPA2, respectively.
















refrange CDATA #FIXED "linkwith B"

reflevel CDATA #FIXED "linkwith 2"

reftype CDATA #FIXED "linkwith GROUPA" >


refrange CDATA #FIXED "linkwith B"

reflevel CDATA #FIXED "linkwith 3"

reftype CDATA #FIXED "linkwith CMPA2" >


The HyTime-aware validation processes run on instances of form content can report non-compliance to the specified dependencies. As well, during form creation time, HyTime-aware editing facilities can mandate compliance to dependencies to ensure forms are complete before being saved.

7. Observations and Recommendations

The demonstration environment for SGML-based electronic forms plus the accompanying consultant report provided a context for the Project Advisory Group to discuss and assess the implications of this evolving technology for forms applications. The following observations and recommendations focus on forms standardization as well as the integration of forms with other applications.

7.1 Forms related standardization

As noted in section 2.3, this investigation began with an assertion that standards could be used in a number of areas to support an electronic forms environment. Based on the discussions of traditional and structured systems, given in section 3, and the investigation results provided in section 6, the Project Advisory Group concluded that opportunities exist for implementing electronic forms using existing standards and an "open systems" approach to realize corresponding benefits in data capture, interchange and archiving. The following paragraphs identify an number of these opportunities and offer recommendations for follow-up activities.

7.1.1 Forms Design

A cursory examination of existing government forms reveals that identical content could be identified in various ways. For example, an individual applying for training is required to provide a "family name" and a "given name" whereas this individual would be asked to give the "name of traveler" on a travel expense claim. These variant ways of identifying identical information across different forms can make it difficult to integrate/process and re-use the information.

One issue which deserves resolution is whether and to what extent uniform naming and coding of identical information across multiple forms is required or beneficial. A related issue is whether standard and proprietary systems alike could effectively use these common naming and coding conventions to make systems easier to learn and use while facilitating information exchange between independent applications and among separate organizations whenever the information needs to be exchanged.

Although various benefits might be realized through uniform naming and coding conventions, it is essential to recognize that this information is being gathered and specified in accordance with program needs rather than systems efficiencies. (4)

In addition the electronic form designs will have to meet all statutory and policy requirements, including those related to the Privacy Act, the Official Languages Act, the Communications policy, the Federal Identity Program policy, the Government Security policy and the information collection requirements of the MGIH policy

Version control is another aspect of forms design and management that poses a challenge, especially in decentralized systems and geographically dispersed departments. Explicit identifiers and controls are needed to ensure that everyone is using the proper version of a form at any given time. Older design specifications and revision justifications must be retained to ensure that archival data can be interpreted correctly when needed.

Data dictionaries associated with proprietary electronic forms systems allow forms managers to define and control the names assigned to specific content areas on the form. However, these designs can not be shared across the respective systems since, the proprietary data dictionaries can't receive or transmit the dictionary contents. The definitions embedded in the forms applications may also be duplicated in data dictionaries supporting other departmental applications.

As noted in the analysis phase, the layout of information on paper forms is influenced somewhat by the media as well as workflow considerations. Both of these aspects are impacted by the electronic environment and the opportunity of moving to electronic forms could be used to examine the significance and need for certain information.

Recommendation: - The potential for using common guidelines and standards for defining and sharing data dictionary content for forms should be explored and aligned with related initiatives on common reference data definitions and data dictionary standardization within the federal government while respecting existing policies which are relevant to forms design such as the one on official languages. This potential should also be explored with forms software vendors in order to arrive at the optimal solution for designing standard electronic forms and for exchanging these design specifications.

(4) It is normally within the context of forms design that privacy principles are considered, such as only asking for personal information that is directly relevant to the program, including the required statement concerning whether completion of the form is voluntary or mandatory, how the information will be used and where the information will be held, etc.

7.1.2 Edit Rule Specification

Edit rules pertaining to a specific form may be provided by a variety of sources. In the case of paper forms these may appear on the verso of the form or in administrative manuals. Within automated applications they may be included in data dictionaries or linked applications. Valid values for a specific item of information may represent standardized codes or they could be unique to a given form.

Available commercial forms applications provide a user-friendly interface to assist the user in recognizing what information is required and in supplying accurate values. In addition to the familiar forms layout, user help routines, pop-up menus of valid data values and post entry edits may be provided to support requisite and accurate data entry. The applicable values and associated explanatory information may be held in the supporting data dictionary. Various kinds of explanatory information could also be supplied through database look-ups.

By comparison, an SGML DTD provides a means of designating forms content as mandatory, repeatable and optional information. The presence of forms contents can be enforced by SGML-aware forms applications. However, it will be impossible to validate specific values for fields on the form until software is commercially available which supports the HyTime standard and which will enforce values defined in the DTD for each form. SGML applications software could be customized to invoke the appropriate validation rules whether these are located in the document type definition, in a database or a document.

Signature validation and verification remains a major challenge within electronic forms systems and vendors are devising unique solutions to fulfill this need. Various options for data encryption are available to ensure that a signature is valid (e.g. bit check sums on existing fields, challenge/response key methods, etc.) Selection of the best option may also be complicated by the need for multiple sign-offs and the requirement to delegate authorization authority to subordinates. The signatures must be auditable for many years in the future depending on requirements set by law or regulation. The algorithm, procedures and techniques must be robust enough to ensure non-repudiation by the signing parties.

The "e-forms" should have data entry points to indicate the designation / classification of the information contained there in. Only certain entries should be permitted such as Protected-A, Confidential, Secret etc. The determining factor in applying security measures is dependent on the sensitivity of the information processed and the environment where the data is stored and / or transmitted. Hence a threat / risk assessment should be conducted relative to the implementation of e-forms that contain data sensitive to disclosure, integrity and availability.

A signature encoding and authentication solution that may become common to government departments is being prepared by Government Telecommunications and Informatics Services (GTIS). The prospective solution includes a public crypto-key infrastructure to support assignment and management of "public keys" and "private keys" within the electronic directory system. Such services will become indispensable as government initiatives expand to include electronic filing of confidential documents such as income tax forms. Since the GTIS service is designed to encrypt and decrypt data for interchange purposes, it should also be capable of authenticating signatures on forms.

In addition, various working groups established by the CAR ITS Steering Committee are presently addressing issues associated with electronic authorization and authentication, firewalls and gateways, public key infrastructure, smart cards, privacy and confidentiality, legal issues and accountability all of which affect. A final report by the WGs, slated for June 1995, is expected to define a common security solution for federal government departments which should be equally relevant to electronic forms applications.

Recommendation: The need and opportunity to utilize common mechanisms to validate values and signatures in forms applications should be shared with government groups concerned with common reference data definitions, data dictionary standardization and security. In addition, the authorities for the electronic directory pilot should be encouraged to explore the potential support that a public crypto-key infrastructure and associated service could provide to forms applications.

7.1.3 Forms Filling

Commercial forms applications include forms filling modules that are optimized to work with proprietary user interfaces. These user-friendly interfaces provide a familiar form layout. To make these interfaces even easier to use efforts could be undertaken to rationalize the forms structure, legibility and task flows.

Whereas open systems supporting SGML do not impose a specific "look-and-feel" on the user interface, the opportunity exists to develop a standard means of prompting the user for the required forms data. In effect the user might be provided with the electronic equivalent of a blank paper form or preferably with prompts managed by an "expert system" interface. The standard specification would be analogous to a document layout which is currently provided by a Format Output Specification Instance. The major difference is that the layout would consist of empty "boxes" which remain to be filled out by the user and this specification might be based on the newly developed ISO standard for document layout specification.

Regardless of the technological option that is used it will be essential to ensure that the "forms" are clearly presented and that they are easy to understand and use as required by government Communications policy. This policy applies especially to forms where the quality of the communication will invariably impact on the quality of the input and the overall transaction. The factors to be considered include the terminology used, the flow and structure of the information, the presentation of the various components and the clarity of the instructions, or prompts.

Recommendation: A pilot project should be developed to investigate possible options, including the use of the recently standardized Document Style Semantics Specification Language, for guiding and assisting users in providing the required information. Regardless of the technological option that is used it will be essential to ensure that the "forms" are clearly presented and that they are easy to understand and use as required by government Communications policy. Electronic forms vendors should be encouraged to participate in this investigation and to validate that such specifications could be used effectively without unduly compromising existing user interfaces.

7.1.4 Forms Display & Printing

Currently, forms layout and display is determined by proprietary system procedures and facilities. Using existing forms products a forms designer is given a certain degree of latitude in selecting line styles, colour and fonts to control the video display. Selected issues and recommended choices for forms layout are being addressed in a draft specification prepared by a working group of the Canadian General Standards Board.(5) An outstanding question is what is the best method to identify the originating department of a "filled-in" form as required by the Federal Identity Program. To resolve this matter and to guarantee accuracy, it may be necessary to encrypt the corporate logo or signature data.

The ability to specify document layout in a standard compliant way has been defined by the ISO standards for Document Style Semantics Specification Language and Standard Page Layout. An added requirement may include the ability to specify fonts and graphic elements using corresponding standard specifications. In addition, it seems appropriate that forms display should be customized to show only those fields that contain data and to omit those which are empty.

Since there are no pressing requirements to interchange common print or screen display specifications for electronic forms, commercial applications are free to provide attractive, proprietary displays in various media. However, the ability to reproduce archived electronic forms on paper and on the screen may become an issue since the proprietary print and display codes, in use today, are unlikely to run on future hardware platforms or forms applications. Archived forms data may have to be converted automatically or manually to ensure its useability in the future.

Recommendation: The need for consistent display of forms on electronic and hardcopy media needs to be examined further. Opportunities for specifying these types of displays in an applications and hardware independent manner should be explored with the appropriate standards bodies and the private sector.

(5) Canadian General Standards Board. Committee on Standardization of Forms. Electronic Forms Design Standards Working Group. Standards for the Design and Development of Electronic Forms. (Draft 94-12)

7.1.5 Forms Interchange

As in other cases, data captured and encoded by proprietary applications can be interchanged only among organizations that use the same software package or customized data conversion routines. Forms interchange is a minor consideration for small departments which can implement a single forms package or in cases were the information remains within a single organization or division. However, if the department can not impose a single system on the entire organization, as is the case in large government departments or when some of the information may need to be transmitted to outside organizations (e.g. private or public sector training providers as in the training application and authorization scenario) than the ability to interchange forms data is a substantive requirement.

As verified by the demonstration project, SGML provides a vendor and application independent means of specifying and encoding the contents of electronic forms. The underlying open systems strategy facilitates the interchange of forms information among all systems that are capable in processing the SGML syntax and offers a specific interchange format for commercial forms vendors.

To implement SGML-based forms data interchange, standard document type definitions (DTD's) would have to be developed for each form that is in common use. Conventions would also need to be specified to ensure that the corresponding DTD was available to all the interchange participants.

Recommendation: Dicument type definitions (DTD's) should be developed for a representative sample of government forms to verify that departments could effectively meet their forms interchange requirements and that commercial e-forms systems could efficiently generate the SGML-encoded data.

7.1.6 Form Information Storage

Like other international standards, SGML places no constraints on how the standards are implemented. Implementors could use SGML for internal data storage or exclusively for data interchange purposes. Although data might be compressed or customized to optimize form information storage or improve system response, it may also need to be maintained in an application independent manner for archival reasons or for cross applications processing (e.g. decision support systems based on a data warehouse paradigm).

The data storage provisions must also meet government guidelines on information access and privacy. Thus the information in an entire form or selected portions may have to be locked using security codes, encryption and other access controls. SGML coding could also be supplied to indicate the conditions under which data was collected and under which it may be accessed.

Other requirements related to archival retention and disposition (e.g. selective purging of entire forms or portions thereof over time) imply added complexity. To meet the retention needs the electronic forms-based information can be output to microfilm but this option complicates selective purging and the legal requirement to delete personal information over time. The other alternative may be to use store this data on optical disk and to remaster the disk contents periodically while deleting the outdated information.

Recommendation: A a major requirement for forms data is that it be maintained in a medium independent information storage format. This format must be maintained throughout the entire forms life cycle in accordance with government access, privacy and archival legislation.

7.2 Integration with Other Applications

It is apparent that some of the data captured on the training form could be supplied and/or validated using information contained in existing databases or documents. The extent to which this will be possible depends on the degree to which an organization integrates its forms, documents and administrative applications. Whereas a single database could store information from all three types of applications, the ability to re-use the data will be directly linked to the care that is exercised in structuring and coding the information in an application independent manner.

For example, the information which identifies a training applicant could reside in a personnel management system or in an electronic directory. Furthermore, the regulations defining employee entitlements, obligations and conditions under which training may be provided are typically specified in administrative manuals issued by the Treasury Board Secretariat and qualified, if necessary, by department training programs. These instructions could be accessible to the applicant, the supervisor and the program coordinator in a seamless manner using systems that integrated forms and documents. This information could be entered once and reused as necessary to support forms data entry, validation and information interchange.

Recommendation: Since the Treasury Board Manual includes various details concerning employee benefits and entitlements plus the forms to support the administration of various programs, there is an opportunity to coordinate the definition of forms contents with corresponding instructions and constraints given in administrative manuals. The benefits and challenges of doing so should be investigated closely as the Treasury Board Secretariat proceeds with its effort to convert its manuals to SGML.

7.3 Alternate Non-proprietary Forms Standards (HTML & ODA)

The increasing popularity and rapid evolution of Internet applications offer an "e-forms" option within the Hypertext Markup Language (HTML) - an application that uses a subset of SGML. In addition the on-going support of the Open Document Architecture (ODA) standard by the European Economic Community has also created the opportunity to use generic ODA applications for forms creation, data management and interchange.

Originally, the HTML-based forms were devised to prompt users for terms to be submitted to an information retrieval application. More recently this capability has been extended to support ordering of documents and services. Nevertheless, this functionality falls short of the e-form functionality investigated in the demonstration.

Although, ODA might represent a more efficient coding mechanism than SGML and while simultaneously providing a means of specifying form layout, commercial ODA-compliant applications aren't readily available at this time. This situation may change and the possible use of ODA may have to be examined further at a later date.

Recommendation: Both Intrernet forms and Open Document Architecture appear inappropriate and inadequate to support the evolving electronic forms environment. Therefore, neither option is recommended for serious consideration for government-wide implementation at this time.

7.4 Workflow Management

Workflow management is a generic function which is used to support various types of forms and document management applications. This support function must be flexible and capable of meeting the control requirements of individual departments as illustrated by the reserved boxes at the bottom of the paper training application and authorization form. Although various commercial packages are available to provide workflow services all are vendor specific. Standards are not being developed in this area and there is a wide range of approaches to choose from.

Recommendation: The best way to introduce a degree of commonality to manage forms workflow, is to ensure that the electronic forms replicate the required control points. For example, the training application and authorization form requires specific information from various individuals plus their signatures. The requisite information plus the associated signature enable the workflow to be controlled and managed.

7.5 Summary

Electronic forms applications implemented by government departments should be aligned with applicable opens systems standards to ensure that the information which is created, managed, accessed and archived will comply with government information technology and information management policies. To validate that these standards and policies can be implemented for electronic forms, the Treasury Board Secretariat should coordinate a pilot forms implementation that would be undertaken by a limited number of departments and commercial forms software vendors.

Appendix A - Analysis of Training Application and Authorization Form

A.1 Introduction to the Table

This table addresses all the fields on the Training Application and Authorization form. It is important to note that the information presented in this table is not to be taken as a Government of Canada specification concerning the proper use of this form. The table columns are used as follows:

	- Field and Content - The purpose of the values in these columns is to unambiguously identify the field 
and/or location which is being addressed;

- Lexicon - This describes the structure and length (if it has been defined) of the information to be entered into the field;

- Values - This is a list or reference to a list of valid values (if available) for the field content;

- Opt - Optionality - This shows whether entry of information in the field is mandatory (M), optional (O) and/or repeatable (R) and

- Comments - This contains other information about the field, including reference to special validation requirements.

Note that the information presented in the lexicon and comments columns of the table reflects assumptions made during the analysis of the form.

A.2 Validation Requirements

This table will be of help in determining validation requirements for the form. There are several types of validations that may be performed on information in the Training Application and Authorization form. Some of these are listed.

	- Lexical Analysis - The structure of the information to be entered into a specific field is described.  The 
following symbols will be used to represent valid character patterns:

X - any character data (a 12 digit string of characters is represented as "X(12)")

9 - numerical data (a 5 digit number is represented as "9(5)" or "99999")

A - alphabetic data (a 2 character alphabetic string is represented as "A(2)" or "AA")

An example of lexical analysis is the validation of a Canadian postal code - valid codes must appear as "letter number letter space number letter number", or "A9A 9A9".

- Multiple Fixed Values - There is a list of allowable values for the contents of a field that are known at design time. An example are the values Male and Female, used for a person's sex.

- Table Look-ups - There is a list of allowable values for the contents of a field that are known at the time the information is entered or accessed. Examples are the Special Needs coded values and the Province coded values.

- Reasonability Checks - This is especially appropriate in validating dates and times. An example involves the "from" and "to" dates of a training course. In this case, the check would be that the "from" date is the same as or earlier than the "to" date.

- Formulae - This type of validation deals with relationships between values in different fields on the form. For example, there would be validation that a total cost was the true sum of the itemized costs making it up.

- Validation of Authority - This applies to the sign-offs required by the form processing. The person who is signing-off must be in a position of enough authority to do so. This could be established by a cross-reference (i.e.: a list of names or positions, specifying the authority of each) or a set of rules. The most likely case is a set of rules, and an example of such a rule is "anyone whose position is this classification or higher and belongs to the same organization may sign this" for each possible signing of the form. This example would involve links to information related to the classification and position areas.

Field Content             Lexicon    Values        Opt  Comments               

n/a   (distribution                  PUBLIC        M                           
      instructions)                  SERVICE                                   
      - pre-printed on               COMMISSION,                               
      the bottom of each             DEPARTMENT                                
      part of five-part                                                        

n/a   (FOR PSC USE ONLY)  X(12),                   O    - administrative use   
      - at top of form    X(3),                                                

n/a   (Application                   Original,     M    - if original, must    
      Status)                        Amendment,         not refer to existing  
      - check boxes near             Cancellation       file number            
      top of form                                       - if amendment or      
                                                        cancellation, must     
                                                        refer to an active     
                                                        file number            
                                                        - if there are         
                                                        changes, it must be    
                                                        an amendment           

1     File number         A9(7)                    M    - pre-printed,         
                                                        example A1906212       

2     Special needs       99.00      01, 02, 03,   O    01-Blind, 02-Deaf,     
                                     99                 03-Physically          
                                                        Disabled, 99-Other     

3     Family name,        text                     O                           
      Given name and                                                           

4     P.R.I.              X(10)                    O                           

5     Sex                            Male, Female  O                           

6     Classification      X(2),      samples CS,   M                           
                                     SM, AS,  EL                               

7     First official                 English,      O                           
      language                       French                                    

8     Position Title      text                     O                           

9     Employee's office   999-999-99               O                           
      telephone number    99                                                   

10    Department name     text                     O                           

11    Department Code     XXX                      M                           

12    Branch/Division     text                     O                           

13    Office,             text                     O                           
      mailing address                                                          

13    City                text                     O                           

13    Postal code         A9A 9A9                  O                           

14    Supervisor's name   text                     O    - supervisor must be   
      and title                                         employee of same       
                                                        organization as        

Field Content             Lexicon    Values        Opt  Comments               

14    Supervisor's        999-999-99               O                           
      Telephone number    99                                                   

15    Supervisor's        text                     O                           
      mailing address                                                          

15    Supervisor's City   text                     O                           

15    Supervisor's        A9A 9A9                  O                           
      Postal code                                                              

16    Objective of        text                     O                           

16    Supervisor's        sign-off                 O                           

16    Signature date      date                                                 

16    Employee's          sign-off                 O                           

16    Signature date      date                                                 

17    Course code         X(7)                     O    - if source of         
                                                        training (field 25)    
                                                        is Training Program    
                                                        Branch of the Public   
                                                        Service Commission,    
                                                        then this code must    
                                                        be from the Schedule   
                                                        of Courses             

18    Course title        text                     O    - if source of         
                                                        training (field 25)    
                                                        is Training Program    
                                                        Branch of the Public   
                                                        Service Commission,    
                                                        then this code must    
                                                        be from the Schedule   
                                                        of Courses             

19    Location of         text                     O                           

20    Date of course      date,                    O                           
      (from, to)          date                                                 

21    Departmental        999.00     as listed on  M    - allowable courses    
      training program               instruction        depend on HR class of  
      code                           sheet              applicant              

22    Time of training               Yes, No       M                           

23    Duration of         999.00     greater than  M    - must be consistent   
      training                       0                  with from/to dates of  
      (person-days)                                     course                 

24    Language of course             English,      O                           

25    Source of training             TPB/PSC,      M                           

26    Transit time        999.00     greater than  O                           
      (person-days)                  0                                         

27    Province (code)     XX         as listed on  O                           

Field Content             Lexicon    Values        Opt  Comments               

28    Location            text                     O                           

29    Tuition fee /                  None,         O                           
      Reimbursement                  Half,                                     

29    Tuition Fee                    valid         OR                          
      - Financial code               responsibilit                             
                                     y centre                                  

29    Tuition fee         money      greater than  O                           
      - Estimated cost               0                                         

29    Tuition Fee         money      greater than  M                           
      - Actual cost                  0                                         

29    Travel / Living                valid         OR                          
      - Financial code               responsibilit                             
                                     y centre                                  

29    Travel / Living     money      greater than  O                           
      - Estimated cost               0                                         

29    Travel / Living     money      greater than  M                           
      - Actual cost                  0                                         

29    Other                          valid         OR                          
      - Financial code               responsibilit                             
                                     y centre                                  

29    Other               money      greater than  O                           
      - Estimated cost               0                                         

29    Other               money      greater than  M                           
      - Actual cost                  0                                         

29    Total               money      greater than  O    - derivable from       
      - Estimated cost               0                  itemized costs         

      Total               money      greater than  M    - derivable from       
      - Actual cost                  0                  itemized costs         

30    Responsibility                 valid         O    - RC either here or    
      centre (collator)              responsibilit      in particular cost     
      code                           y centre           type                   

31    Financial           sign-off                 O                           

31    Financial           date                     O                           
      authority Date                                                           

32    Manager's approval  sign-off                 O                           

32    Manager's approval  date                     O                           

33    DEPARTMENTAL        text                     O                           

Field Content             Lexicon    Values        Opt  Comments               

33    DEPARTMENTAL        sign-off                 O                           

33    DEPARTMENTAL        date                     O                           
      Signature  Date                                                          

34    DEPARTMENTAL USE    X(10) - 6                OR                          
      CODES               fields                                               
                          X(1) - 9                                             

n/a   FOR USE BY                     Application   OR                          
      DEPARTMENTAL                   received,                                 
      TRAINING DIVISION              Candidate                                 
      Action                         enrolled,                                 

n/a   FOR USE BY          text                     OR                          
      TRAINING DIVISION                                                        

n/a   FOR USE BY          date                     OR                          
      TRAINING DIVISION                                                        

n/a   FOR USE BY          sign-off                 OR                          
      TRAINING DIVISION                                                        

n/a   Evaluation of       text                     O                           

n/a   Evaluation of       sign-off                 O                           

n/a   Evaluation of       date                     O                           

Appendix B - Summary of Feasibility Demonstration

Ste Description    By         Tool Type        Tool            Platform  Attachment  

1   Plan           Project                                                           

2   Analyze        Project    Groupware       CADE® Groupware  Windows   C           

3   Design         Project    DTD Editor      NEAR & FAR®      Windows   C, D, E, F  
    Content        Team                                                              

4   Author         Applicant  Instance        Microsoft Word   Windows   G, H, I     
    Instance                  Editor                                                 

5   Send Instance  Applicant  Network                          Novell                

6   Edit Instance  Supervisor Instance        InContext        Windows   J, K, L     

7   Send Instance  Supervisor E-mail                           Internet              

8   Edit Instance  Coordinato Instance        ADEPT Publisher  UNIX      M, N, O     
                   r          Editor                                                 

9   Send Instance  Coordinato E-mail          Lotus Notes      Internet              

10  Edit Instance  Applicant  Instance        Author/Editor    Macintosh P, Q, R     

11  Archive        Applicant  Printing                         Windows    P          
    Instance                  Mechanism