[This local archive copy is from the official and canonical URL, http://internet.adb.gu.se/publications/14/conflict.html; please refer to the canonical source document if possible.]


Conflicts between the possibilities and the reality in the field of structured electronic documents

Experiences from a large-scale SGML-project

Astrid E. Jenssen* & Tone Irene Sandahl**

*Center for Information Technology Services (USIT), Box 1059 Blindern, N - 0316 Oslo, Norway
**Department of Informatics, Box 1080 Blindern, N - 0316 Oslo, Norway
University of Oslo,
Email: Astrid.Jenssen@usit.uio.no
Email: Tone.Sandahl@ifi.uio.no

Abstract

The paper presents experiences based on the study of a pilot project integrating an SGML-based document processing system at the University of Oslo, Norway. The experiences are examined from three perspectives in order to discuss them in relation to different aspects of the system; the use situation, the organizational benefits and challenges, and the technological requirements. Improving the system based on experiences within one perspective may lead to conflicts to consider when improving the system based on experiences found within other perspectives. The paper states and discusses some of the conflicts in SGML-based document systems. The paper concludes with challenges in development and use of SGML-based document systems, and states some issues for further research.

Introduction

Since early in the computer age, the need to classify, compute, combine and recombine, count, sort, and manipulate information has been present. From the late 60s this has been done by cutting the information into little pieces (data) and putting it into databases. Databases are fine for discrete, predictable pieces of information, but they do not work very well for infomation like stanzas, scientific descriptions, receipts, maintenance procedures etc (Alschuler, 95). A database system is basically a computerized record keeping system (Date, 86). As stated by Reinhard (94), at least 80% of electronic information in organizations is in the form of documents, as opposed to database records. Traditionally, documents have been static, represented as files on disks. Until PCs were networked these files usually belonged to only one user and passed from one person to another in printed form. It is a challenge to make these files/documents more dynamic, in order to be able to access, search, use and reuse, retrieve, present, exchange and distribute them without loss of information (Reinhard, 94). Standard Generalized Markup Language (SGML) makes it possible to use structured concepts in text in general. SGML is a descriptive (content) markup language (Goldfarb, 92). The basic idea is very simple: a document is described in terms of its structural components rather than its presentation on a single medium. In addition, the structure and the occurrence of the components in documents are defined in a "Document Type Definition" (DTD), and SGML itself is independent of any one systems (Ibid.). SGML lessens the difference between documents and databases, because of the predefined content and structure as well as the presentation independence. The nature of the data will still be a difference. Unlike database records, text is not regular. It can have many levels of nesting or recursion, it can vary in length, and may require much more sophisticated formatting and rendering capabilities in order to keep all information (Travis and Waldt, 95).However, from a technological point of view, SGML is an appropriate solution to the problem of making documents more dynamic and accessible. An enthusiastic SGML implementor put it this way (Alschuler, 95, p. 1):

[.. SGML] puts the computing power of information technology behind the all-encompassing descriptive power of human language.

The aim of this paper is to present some conflicts between technological possibilities and practical use, based on research on a pilot project integrating an SGML-based document system at the University of Oslo. The research focused on experiences and impacts of introduction and first use of the SGML-based document system. Three perspectives were applied in analysing the project; the use perspective, the organizational perspective, and the technological perspective. The perspectives focus on use aspects in the SGML-based document system, the organizational gain and costs, possibilities and restrictions, and the technological aspects of the computer system, respectively. Our findings point at aspects to be considered by organisations involved in development and integration of SGML-based solutions. Some of our findings point to dilemmaes and problems related to standardized document handling, as well as issues for further research within the field of electronic documents in general. Before presenting the results of the study, we give a short description of SGML and the pilot project.

Standard Generalized Markup Language

Charles Goldfarb and his team from IBM created a method ("Generalized Markup Language") in the 1960s to let text editing, formatting, and information subsystems share documents. Over the course of nearly two decades and through the efforts of many people and groups, GML gave rise to the Standard Generalized Markup Language (SGML). In 1986, SGML was adopted as a standard of the International Organization for Standardization (ISO 8879:86). It is designed to enable text interchange and it is intended for use in the publishing field (Smith, 92). Since then, it has been increasingly adopted as the international standard for data and document interchange in open system environments, including the automotive, defense, commercial aerospace, pharmaceutical, electronics, and telecommunications industries.SGML has three characteristics which distinguishes it from other markup languages: its emphasis on descriptive rather than procedural markup; its document type concept; and its independence of representation in any system.SGML is used to describe the stucture of a document (descriptive markup), not its appearence (procedural markup). A descriptive markup system uses markup codes which provide names to categorize parts of a document. Markup codes such as <course> identify a portion of a document and assert of it that "the following item is a course". SGML eases the interchange of text across platforms, because there is no need to "translate" between machine-dependent foramts. The same document can readily be processed by many different pieces of software, each of which can apply different processing instructions to those parts of it which are considered relevant. And, different sorts of processing instructions can be associated with the same parts of the file. Since only the structure or content of a document is marked, any given viewer of that document can decide what the "look" will be. The structures can be associated with a particular font style or size, and assigned spacing or layout characteristics when viewed with a particular product. The markup of the document never changes, only the way it is interpreted.Second, SGML introduces the notion of a document type, and hence a document type definition (DTD). An SGML document always has an associated DTD that specifies the rules for the structure of the document. E.g., the DTD for a course catalog might specify that the document type (catalog) must have information about one or more courses ("one or more" represented by the "+"). Furthermore, each course has to have a description (descrip), followed by zero or more combinations of day, time and place ("zero or more" represented by the "*"). Information about teacher(s) is required, and so forth. The type of a document is formally defined by its parts (course, day, time, place, ...) and their structure in the DTD. Below is an example of parts of a DTD.

:

<!ELEMENT - - catalog (course+)>

<!ELEMENT - - course (descrip, (day, time, place)*, teacher+)>

<!ELEMENT - - teacher (fname, sname, email? phone*, fax?)>

:

Figur 1. Part of a catalog DTD
A basic design goal of SGML was to ensure that documents encoded according to its provisions should be transportable from one hardware and software environment to another without loss of information. The third feature addresses it at the level of the strings of bytes (characters) of which documents are composed. SGML provides a general purpose mechanism for string substitution, that is, a simple machine-independent way of stating that a particular string of characters in the document should be replaced by some other string when the document is processed. For more information about the SGML standard itself look elsewhere, e.g. (Goldfarb, 92).

The University Catalog as a Pilot Project

At the end of 1992, a project was initiated at the University Center for Information Technology Services (USIT) to determine what type of electronic infrastructure could deal successfully with electronic documents and other forms of information at the university. The infrastructure must adress the whole life cycle of a document, i.e. production, updating, filing, administration, distribution and the presentation. The use of SGML in the project came up as a possible key tool for describing the documents and their content.To gain experience with practical use of SGML on some parts of the information produced at the university, USIT established in 1993 a pilot project that involved developing an open and flexible solution for the production, exchange and distribution of the university's course catalog. The pilot project was initiated to develop a technical infrastructure and administrative routines for dealing with the catalog, which contains dynamic information for all the students and the staff. It is about 450 pages long, 50.000 copies are printed twice annually and it is available through World Wide Web (WWW). During the pilot project, about 40 writers were involved in maintaining the information.

The goal of the pilot project

The main goals of the pilot project (not the goal of the study) were to produce a better (and in the long run, cheaper) catalog, to make it easier to update and maintain, and to gain practical experience introducing SGML at the university. To produce a better catalog meant developing an electronically accessible, dynamic catalog and making it more readable through a better structure and layout. When the pilot was initiated, there was little experience with respect to practical use of SGML at the university.

Why use the catalog as a pilot project

The university catalog is well-known to the students and staff at the university. Not everyone reads all parts of the catalog, but it contains information about all the different units and gives a comprehensive overview of the possibilities offered at the university. The catalog contains information about student services, courses, registration, important dates, administrative and teaching staff, and numerous other topics. The catalog was chosen as a pilot project for several reasons:

The information is delivered from different units (faculties, departments and central administration) at the university.

A number of writers are responsible for updating different parts of the catalog.

Project organisation and the people involved

The pilot project was organised as a project involving a board and a project group. Both groups consisted mainly of staff from USIT. The project group consisted of 3-5 IT-people with 1-2 working fulltime, the others working part-time on the pilot. The group was responsible for the system development process. The central administration was responsible for editing the catalog and some of the writing. There were about 40 writers from the different university faculties and departments.The catalog is separated into separate parts for each of the faculties describing courses offered at that faculty, and different parts for other kind of courses e.g. distance education, information about student services, cooperating institutions etc. The writers at the central administration maintain information about all parts except the information from the faculties. They have to coordinate with other units at the university to collect information to be presented in the catalog. The writers at the faculty level maintain information common to a faculty and the writers at the underlying departments maintain information mainly about courses offered at the department. All are responsible for their parts of the catalog, to collect information and to distribute updated information to the students and staff. For instance, the writers at department level are in contact with different lecturers to collect information about which courses are to be offered each semester. They have to coordinate the allocation of lecture rooms taking into account the lecturers' preferences for day, time and place and in cooperation with the writers at faculty level to avoid overbooking. They do the updates and distribute the results to the different lecturers for proof reading. Then they update the information again as many times as necessary.

The development in short

The system development project included a number of separate tasks. Document analysis was necessary for building the appropriate DTD and suitable applications. The information had to be structured and coded according to the DTD. Scripts for conversion from the SGML DTD to HTML were made so the information would be available through WWW. Scripts were also made to convert from the SGML DTD to TeX/LaTeX for typesetting the catalog with an appropriate layout. For more information about TeX/LaTeX, see e.g. (Knuth, 84) (Leslie, 86). A print-on-demand solution was established so the catalog in whole or part of could be printed on any printer connected to the university network. This process was not linear, but evolutionary, and different stages were repeated several times. For each edition of the catalog, the solutions of various tasks were further developed. The following table summarises most of the work done in the different phases of the system development process. The main phases presented in the table are further described in the text below.

Work periodWork done By whom
Catalog 1

93: march-

june

Document analysis

Developing the first version of the DTD

Manually encoding the information for the autumn catalog

Developing the printed version

Developing scripts for conversion to HTML

Set up the catalogs, files and access in the Unix file system

Organising the work flow

User training

User support

Evaluation

USIT/writers

USIT/writers

USIT

USIT

USIT

USIT

USIT

USIT

USIT

USIT/writers

Catalog 2

93: july-

desember

Developing the second version of the DTD

Merging already encoded information into the new version of the DTD

Encoding the information through templates in word processors

Conversion of new information to SGML

Improving the printed version

Printout possibilities for the writers through Unix

Organising the Unix file system

Organising the work flow

User training

User support

Evaluation

USIT/writers

USIT

Writers

USIT

USIT

USIT

USIT

USIT/writers

USIT

USIT

USIT/writers

Catalog 3

94: january-

june

Improving the second version of the DTD

Improving the electronic version

Using the SGML-editor for updating the information

Developing schemes to be used with the SGML-editor

Improving the conversion to HTML

Organising the Unix file system

Organising the work flow

User training

User support

Evaluation

USIT/writers

USIT

Writers

USIT

USIT

USIT

USIT/writers

USIT

USIT

USIT/writers

Catalog 4

94: july-

desember

Further improvements of the second version of the DTD

Printout possibilities through World Wide Web

Using SGML-editor for updating the information

Improving the style sheet used by the SGML-editor

Organising the Unix file system

Organising the work flow

User training

User support

Evaluation (interviewing)

USIT/writers

USIT

Writers

USIT

USIT

USIT/writers

USIT

USIT

USIT/Dep. of informatics/writers

95: january->The system in ordinary use

Evaluation

USIT/writers

USIT/Dep. of informatics/writers

Table 1. The work done in the different phases of the system development process

The first version of the DTD was developed with continuous changes, based upon improved knowledge of the structure and content, through dialogue with the writers and seeing new requirements when encoding the information and developing other parts of the SGML-system. For the second catalog, the DTD was reorganised and improved upon before the writers started to work with the information. The DTD was developed mainly through work with the first and second catalog. Since then, there has only been minor improvements.The plan was to create a DTD and then have the different writers code their own information using an SGML-editor. For the first catalog, this expectation was premature. A decision was made to have people at USIT do the coding. They were familiar with computers, software and use of the network. During the work with the second catalog, some of the information could be reused from the first catalog and updated. For the rest of the information, the writers used templates on word processors to fill in the catalog information. They did not have to work with the actual conversion into SGML. This was done by automated procedures combined with some manual coding at USIT. Since the work with the third version of the catalog, the writers have been responsible for coding the information for later updates and corrections. To do this, they use an SGML-editor.The university has developed a print spooling system (PRISS) which makes it possible to print any file from any computer (Mac, PC, Unix Workstation) to any printer on the network. One of the aims of the pilot was to develop a print-on-demand facility for the SGML-coded material. The writers needed printouts of their own parts of the catalog on their local printer with the same layout as the final catalog. TeX/LaTeX was used as the tool for generating the postscript files. To accomplish this, a script for conversion from the SGML-DTD to TeX/LaTeX was developed and this in turn was integrated into PRISS. For the first catalog, the layout was continuously developed. The writers had to proof-read on printouts from the SGML-editor. For the second catalog, the writers could order printouts with the "final look" to any printer from the Unix server. For the third catalog, ordering printouts was done through a WWW interface.The catalog is available through WWW, giving hypermedia-based access to the information. To do this, a script for conversion from the SGML-DTD to HTML was developed. By continuously updating the SGML-information, up-to-date information is available to students and staff through WWW.Unix file servers are used to manage the different files and access to them. Every writer has access from their Macintosh or PC to read and write their own file(s) and to read the other files in the university catalog. From their desktop machines, they establish a connection to their catalog on the common Unix server, and they use an SGML-editor to update the information in the files. Prior to the pilot project, the work with the catalog was coordinated by people from the central administration. With the catalog coded as SGML, the work was coordinated by USIT. The dates for user training, updating period and last revisions were set up by USIT in cooperation with people at central and faculty level.Through courses set up by USIT, the writers were introduced to the basics of SGML, the structure of the catalog and how to work with the information. Courses were arranged for each edition of the catalog.

Research approach

The SGML-based document system was developed by a group of 3-5 developers. The two authors of this paper were two of the developers on the project.The evaluation of the system is based on the experiences of the writers, system developers and management. An evaluation of the system, focusing on the applied system, was carried out by collecting empirical data in several ways.We interviewed 22 people involved in the project, including writers and management people at USIT and the central administration unit. The writers used a distribution list for asking questions about the SGML-based document processing system. We analysed 393 emails to this list. We wrote down questions and problems we got from hundreds of telephone calls and direct mails. We analysed the minutes of all the meetings we had with the writers, and internal meetings dealing with more technical problems. In addition, we analysed project reports. The empirical data from interviews, emails, reports, telephone calls, direct mails and meetings form the basis for this paper. We analysed by participatory observations as we both were developers and researchers on the pilot project. In doing the study, we remained aware of the possibility that our double roles as researchers as well as developers might influence on the writers not to tell "the whole truth" about their work situation. Our impression is that there were no problems of this kind with the interviews. To the contrary, we felt that the people we interviewed were pleased to talk to us. As one put it:

Thank you for interviewing me. It was good for me to tell you about my thoughts and attitudes related to the project. You work with these things, and I know that you understand what I am talking about.

The empirical data were analysed by applying Braas (95) framework for information system quality. The framework consists of three perspectives; the use perspective, the organizational perspective and the technological perspective. Applying the use perspective, the writers experience with the SGML-based document system is emphasized. Furthermore, when focusing on the organizational perspective the organizational benefits, concequences and potential of the use of the system are emphasized. The technological perspective frames the technological competence and requirements, etc. This framework was chosen to present the data from the research project, because of its holistic view on the system itself. Other descriptions from SGML-projects, seen in the proceedings from SGML conferences or in SGML books (e.g. (Goldfarb, 92), (Herwijnen,93 ) (Alschuler, 95), (Travis and Waldt, 95)), have a tendency to emphasize the technological possibilities of SGML, and to some extent focusing on organizational benefits and costs. However, from the field of information systems, experiences show that the quality of the users' (in this context: writers') work situation is significant with regard to system quality (Schuler &Namioka, 93). This is not different for SGML-based document system.We have carried out some literature studies, looking for related research. There are books on the market that present and discuss some experiences of introduction of SGML-based document systems into organizations (e.g. (Alschuler, 95), (Travis and Waldt, 95)), but we have not been able to find other research within the topic. As far as we have been able to determine, there is little literature on research in document handling/management in general.

Experiences from introducing SGML into the organization

The experience is presented within three different perspectives; the use perspective, the organizational perspective, and technological perspective. These perspective are presented in Braa´s framework for quality discussions (Braa, 95).

The writers' use of the SGML-based document system

In this section we present experiences and problems related to use of the SGML-based document system. The writers are the main source for information.

Loss of freedom

The use of SGML requires discipline in the way text is written. Structuring the information according to a DTD creates constraints on how to deal with it. Usually, people can present their information in their own way by using the tools they prefer. Dealing with SGML, this freedom is restricted.

It is problematic with SGML, because you have to be so damned correct, otherwise you get problems with your printouts. A few "typos", and then chaos. This is no problem in other word-processors that I know. Ok, you see the misprint on the paper, but you can read it, and use it!Some writers pointed out that the freedom of using a well known word-processor, and the freedom of making your own presentation of the information, is gone. One said that he got the feeling of going back 10 years in time, dealing with text markup in editors like RUNOFF. The SGML-editor used does not have the same functionality as word processors such as MS Word and Word Perfect. An SGML-editor is an assistant for the writer when doing markup according to a DTD. It may incorporate a validating parser that makes it possible to avoid markup errors and guarantee that the document is structurally correct. Some writers stated that when working with the SGML-editor they had to concentrate more on the technology and the structure, than the text itself. Even though for most of the text, they only needed to fill in the information in the right places.

When using Word you almost forget that you use a computer, it is only there - a tool which is incorporated in my work. When using the SGML-editor I have to think about how to use it - how to include which element, and so on. But, I belive I will get used to it (laugh).

Using logical markup for layout

Some writers felt it was confusing not to know how the catalog would look on paper. They were confused by the difference between the logical structure (represented by markup in the text) and the physical structure (or layout, presenting the catalog on paper). We received many questions related to the use of the logical structure. In the beginning almost every writer related the logical markup directly to the printed catalog. Knowing how a specific markup in a context would look on paper, they used this markup for layout rather than for its logical meaning. As an example, they wanted to use the element <bold> to markup a title instead of using the element <title> in the appropriate context for this purpose. The problems related to the differences between logical and physical structure were felt strongly in the beginning of the project. The emails concerned with these questions got quite a lot fewer after the second edition of the catalog. By this time the writers had been through a learning process.

Change in work

Some writers pointed out that the work with the catalog has changed. The catalog is still the product, but use of SGML has changed the process. The process of producing the catalog is more time consuming for the writers than before the initation of the pilot project. They use more time to write and edit the catalog because of the new tool and the structuring. The SGML-editor was said not to be very user friendly. It was hard to tell whether the difficulties were related to the editor functions itself, or the principle of encoding the information. Second, because of the time consuming process, some of them got fewer work tasks, and ended up as "experts" on the catalog. Others got a longer working day because of the overload. Third, others got new work tasks, concerned with production of different kinds of information for distribution.

I do another job now. I have been on courses, and used a lot of time to become qualified to do my job. In fact, I should get more paid now (laugh).Later, but still within the period of the pilot project, we got a strong impression that the writers' work was less time consuming than in the beginning. When working with the fifth edition of the catalog, almost all of writers told that they started to get used to the new prodution process, and that they use less time to produce an edition of their part(s) of the catalog now than earlier in the pilot project. We can also see from the emails that the questions related to the editor and the structure are fewer after the third edition of the catalog. But still, the process is seen as "harder" and more time consuming than before the integration of SGML.

A need for understanding

All of the writers had to participate in training programs and they all needed time to understand the underlying structure (DTD), to learn the SGML-editor, and to get an understanding of how the integration of SGML may influence their work situation. The writers stated that they needed to learn and understand the SGML-based system to work with it.

I see the SGML-people as a kind of a doctor for my information. They say that I have to mark it up to gain some new functionality. Of course, I do that if I know why I have to do it. Comparing to medicine - I take my medicine if my doctor tell me why I have to do so. I do not take medicine if the doctor can not give me an appropiate reason. Obvious!They also wanted to know the benefits of using SGML. They emphasized the need to know the main structure of the DTD, and the where and how of adding new information to the document. Knowing the structure of the DTD requires some understanding about what a logical structure is, and the ability to distinguish between the logical and physical structure. This took time to achieve, but it evolved over time.

Some organizational impacts

In this section we focus on the organizational perspectives, emphasizing the organizational efforts, consequences and potential of use. Both the writers, the managers and the system developers contribute with data and information used in this presentation of organizational impacts.

Groups with a new kind of domain knowledge

The knowledge of how to work with the catalog lead to some writers also getting the responsibility for manipulating other kinds of information to be presented in the university's WWW-based information system. Experience from the pilot project shows that the writers need training and user support to work with the catalog. This further leads to groups with new kind of domain knowledge; one group of specialized writers and one group of SGML-experts. Even though there was planned training, the writers needed to have access to some form of help all the time. They needed help to solve technical problems and to figure out what to do with the different parts of the information, how to code and where to put the markup. Results from the interviews show that the writers see the training and support as highly important and necessary.

The catalog as a contract

Writers emphasized that the catalog has for years been a kind of contract between the departments and the students. The departments demand that the students read (parts of) the catalog, and that they follow information given there. On the other side, the student use the catalog as documentation for what they need to know.

Before, the catalog was a kind of a contract between us and the students, and we wanted it to be like this. How will this be when the catalog changes all the time?Some writers as well as some managers were concerned about which presentation of the catalog should be the one to refer to, the paper version, or the WWW version.During the pilot project, some of the writers' and managers' view of the WWW version of the catalog has changed. In the beginning the writers were most concerned with the printed version of the catalog. Very few were familiar with the presentation of information through WWW. The project group had to do demonstrations to show how the catalog looked in an electronic version. When given the possibility of doing printouts through WWW, many of the writers were still not familiar with using a WWW-client. At the end of the pilot project, several of the writers were more concerned with the electronic version. They suggest changes to the WWW presentation, they ask questions about how to make links internally in the catalog and to other information in the WWW system. Some of the departments refer to the electronic version as the place where the students should look to get the updated information.

Change in responsibility and administrative routines

When analysing the emails we saw that the developers got a lot of questions related to non-technological issues, e.g. dates for deadlines, proof-reading after deadline, wishes for new kinds of information in the catalog, wishes for new computers, etc. Many of these questions have little to do with the document-processing system itself, and before the pilot project they were handled in the sentral administration unit. From the interviews we got a strong impression that the writers experienced a change in responsibility. They saw USIT as the department in charge of the process as well as the product.

After USIT got the responsibility for the catalog, the deadlines are more strict. I feel it is harder to ask for two days' extension now than before.In the beginning, the dates for training and deadlines were set up by USIT in cooperation with the sentral administration unit. Later, these dates were organised solely by USIT.When necessary, USIT also did some proof-reading, editing and structuring of new information. This is clearly documented in the minutes from several project meetings. This kind of work was mostly done at the beginning of the pilot project. USIT still has something to do with these tasks, but the units are now responible for all the updating of their part(s) of the catalog.

Introducing SGML requires effort

There was a heavy workload for all the people involved in the process of introducing SGML at the university. It was time-consuming both to develop technical solutions and administrative routines. The interviews, emails and minutes from meetings show that there are many questions from the writers related to the process, and there were obvious reasons for emphasizing support. The project report documented over 1200 hours overtime related to the first edition only for the technical staff. It decreased to below 700 hours in the process of the second edition, and decreased further with the later editions. As mentioned earlier, the authoring process of the catalog was also very time consuming, and still is.

Technological challenges

In this section we focus on the technological perspective. Both writers, managers and system developers experienced that the technological solutions itself had to be improved. This section presents our findings in this respect.

Access to documents

Many of the questions from the writers were related to getting access to their own files. One of the goals of the project was to integrate a document database to deal with the users' access to the files, version control etc. While waiting for such a solution to be available, we used the Unix file system with user access to handle the users' access to the different catalogs and files. There were some technical problems caused by this solution. These were usually easy to correct, but the writers had to contact the support personell to get it done.Several users had problems understanding the difference between the local harddisk and the common file server and navigation through this system. Sometimes they exported their documents to their local harddisk instead of the right directory on the common file server. The printout routines printed out the file on the server, and in such cases the writers did not get printouts of their last updates.

Printout services

In the emails and telephone calls there were a lot of questions related to the printing facilities. A lot of users stated their problems with getting printouts of their part(s) of the catalog. Since the work with the second catalog, the users could print their parts of the catalog with the same lay-out as the printed catalog, from the common file server to their local printer.If something went wrong in the printing process, the users either got an error message printed or nothing at all. The support personell got the error messages automatically through electronic mail. The most common errors occured because of wrong SGML-code not going through the sgmls parser, or text in the tags which the printing system could not handle. (A parser determines if an SGML document is valid, reports errors if any and produces output available for other applications. The sgmls parser is public domain software).The latter problem arose e.g. because of wrong number of digits in a telephone tag or empty tags, and the problem could have been avoided by making the printing routine more robust.

Appropriate DTD-design

The first version of the DTD was developed through document analysis lead by the system developers involving the writers, managers and people from the central administrative department. The resulting DTD had a high level of detail for encoding the information. The intention was to make the DTD rich enough to be able to fetch information from databases, to link to other information and to have functions for presenting different views of (some parts of) the information in the catalog. The catalog was thought to be a well-structured document, but as it turned out, it was not. We experienced unwillingness among some of the writers to change their way of structuring and presenting the information. We experienced situations were the system developers wanted a more strict DTD than some of the writers, who had strong opinions even on a very detailed level about their own information, thus leading to a flexible DTD. As an example, the different departments had a lot of different ways of organising the information about courses. Some units presented the information about when, where and by whom as prosa, others as table-information. This information was regarded as logically the same by the system developers, and they wanted to standardise how the information should be encoded and presented. They wanted a strict DTD with required elements such as time and place for a course, but such information was not available for all the courses. Instead the DTD had to be more flexible, not requiring specific elements in a course and allowing prosa at different places in the structure. A DTD which seems appropriate for the technological solution may not be appropriate for the writers. Based on experiences in use, the DTD was reorganised and improved upon. It became less detailed for the whole catalog, but still the DTD is rich, strict and contains a lot of details.

Discussion

In the previous section, the experiences are presented within the three perspectives. In this section we will discuss them. Improving the system based on experiences within one perspective may lead to conflicts to consider when improving the system based on experiences found within other perspectives. This section state and discuss some of the conflicts in relation to SGML-based document systems.

The DTD as a foundation for work

During the DTD modelling, there are a lot of aspects which have to be considered, depending on the functionality, (re)usability and presentation of (components of) the documents. The following aspects (among others) have to be considered (Travis & Wald 95): What business problems are going to be solved? What is the purpose of the DTD within the environment? Is it for exchange? Will the documents be linked or representated in databases? Will conversion be necessary - if yes, how and to what? Will some components be reused - how? What kind of functionality is wanted?To be able to design an SGML-based document system which is tailored to fulfill organizational document requirements including functionality as indicated above, the DTD has to be well-defined, rich and strict (Maler & Andaloussi, 95). This can be expensive, espescially the process of authoring the document. However, if a DTD is too flexible and general, it may be harder to achieve processing effectively (ibid).As stated in the sections on empirical work, the writers felt that the freedom of writing was restricted because of the editor and the underlying strict DTD. When writers use an SGML-editor, they have to be aware of the predefined structure, and they have to deal directly with it when authoring. On the other hand, when using an SGML-editor it is fairly easy to avoid ambiguity in markup at input. The SGML-encoded documents are ready for further use, handling and management without any conversion or other forms of adaptions. A more flexible DTD may offer the writers a more flexible writing process, but further use of the SGML-encoded document may be restricted (Maler & Andaloussi, 95). There is a conflict between the requirements of a strict DTD and flexibilty when authoring. As an example, each department presents some general infomation to the students before the listing of the different courses offered by the department. This might be information about important dates, student services, services for handicapped students etc. This should be structures and presented in the same way for alle the departments. In the DTD, we specify the required structure and content. To fill in the information, the writers have to use correct markup for the different kinds of information and to be aware of the predefined sequence. A solution to the conflict could be to offer writers other tools, let them go free of doing the coding, and instead absorb the cost of conversion into SGML. An obvious approach would be using a WYSIWYG-editor and appropriate templates (Herwijnen, 93). However, there are problems related to the use of templates as well. First, research has shown that writers use templates (or style sheets) incompletely (Sørgaard et al, 96). Second, on a technical level, building and maintaining a conversion program is a full-time job. Third, some writers have problems with the use of templates, seeing them as difficult to apply; which style to use - and when? (Sørgaard & Sandahl, 96). An argument for the use of WYSIWYG-editors and templates from the writers' point of view, is that they find it easier to deal with layout instead of logical markup. They like to immediately see how the infomation will be presented in printed version.There is a challenge for system developers to develop an editor that fits both the requirements of logical structure and the requirements for layout. We are aware of existing editors, and are glad to see that the tools are starting to get better in respect to the stated requirements. We also see that the products are results of a compromise between DTD support and layout facilities, and that there is a need for further development of this.In order to design DTDs that fit the work arrangement, there is a need for user involvement in the DTD-modelling process. However, the writers had strong opinions even on a very detailed level on how the DTD should represent their own information. It is a challenge to develop techniques and methods for user involvement that focus on organizational-, technological-, as well as use requirements.We experienced that it took a lot of effort to become familiar with the SGML-editor and the structure. However, after some training, practice and support the writers applied the editor very well. They are now able to produce SGML-encoded information directly in the SGML-based document system.

Who does the job and who gets the benefit?

A groupware application never provides the same benefit to every group member (Grudin, 94). Costs and benefits depend on preferences, prior experience, roles and assignements. We see a parallell in this to the implementation and use of an SGML-based system. Although an application is expected to provide collective benefit, some people must adjust more than others.Most of the tasks in the pilot project were performed by the system development group and the writers. An important experience is that several writers feel that life is more difficult in a structured environment, and that their workload has increased. As a motivation for work, it is important to add something that gives some benefits for the writers (Bogren, 95). In our case, this might for instance be making it easier to reuse parts of the information in other settings, build functions to facilitate the booking of lecture rooms etc. When planning the lectures, there is always a risk of doublebooking lecture rooms. The writers at faculty level have the responibility to go through all the courses from the underlying departments to make sure that no lecture rooms are being doublebooked. By having information about lecture rooms, day and time encoded for each course, it is easy to pick out this information from the whole catalog and sort it according to lecture rooms. This kinds of overview makes it easier for the writers at faculty level to go through the courses, and may be a motivation for the use of SGML. In addition, it is important to avoid technological problems with basic functions. In the pilot project the writers had problems with printouts and access to files. Using the computer in other settings, they are not used to having many problems with these functions. Extensive user support and training was performed by the developers throughout the pilot project. This had an impact on the time left for development. It is a challenge for an organizisation introducing the use of SGML to allocate enough resources to both the development and training and support tasks.The development of the SGML-based system put a heavy workload on the system developers as well. In a short time, they had to build the technological infrastructure, being in charge of developing the DTD, doing the programming, making the tools work satisfactorily and so forth. They also did training and considerable support. In addition, the responsibilty for most of the process and product was shifted from other departments to the technological unit. Getting acceptance by the management, putting effort into this work and stating the change in responsibility may be some of the benefits the system developers need. Building a functioning system and getting positive feed-back from the writers and others may be additional benefits.The collective benefit of the pilot project was a better product. There is a common agreement among writers, managment and technical staff that the catalog as a product has become much better than it was before the initation of the pilot project. The resulting printed catalog is more structured and easier to use than earlier editions. In addition, an electronic hypertext-based version is also available through WWW with search facilities and links to other relevant information. There are questions related to the information in the catalog from people around the world, showing that the WWW version of the catalog is used by people not having access to the paper version.If you are going to produce and maintain higher-energy data, you need "higher-energy" writers to do so (Alschuler, 95). We have seen that raising the skill level among several groups in the organization is crucial, and this in turn leads to new groups of specialized workers. It is almost impossible for a writer new to the SGML-based environment to start to work with it without any training and user support.

It is hard to accept electronic documents as The valid information source

From a technological point of view the conversion of the printed catalog into an electronic version was relatively easy with the use of SGML. Hovewer, some writers were unwilling to accept electronic documents as a substitute for "the real thing". The paper documents carry an aura of authencity and legality that is difficult to shake from peoples' minds (Berry and Goulde, 94). As experienced in the pilot project, the catalog is applied as a contract between students and the administration at the University. The writers are very much aware of their role in giving the students the right information. If the catalog does not contain enough information, or the students do not find the information they need, they ask question of the writers and others in the administrative units. The catalog represents the writers' work. Hughes and King (93) point out that documents are not only a proof of work that has been done, but also informally provides a sense of what is going on in the organization, and to some extent the division of labour. Between the different departments, there is a clear distinction of responsibility to inform the students on different subjects. For instance, the central administrative department has the responsibility of informing the students about general rules and routines at the university, when to register and pay the semester fee etc. The faculties and the underlying departments provide more spesific information related to the different studies. In the printed catalog this information is represented in different chapters. When using (a browser on) WWW it is easy to bypass information, by not clicking on links to it. Some writers fear that information they see as important gets less visible in WWW than in the paper version.However, during the pilot project, several writers became familiar with WWW, and several units at the university emphasize the use of WWW as a channel for information to the students. This leads to a situation where some of the units continuously update the catalog information for presentation in WWW and invite the students to look for the updated information there. Other units still use their old routines writing separate lists with changes to the catalog and putting them on bulletin boards. It is necessary for the managers to decide if the WWW-version of the whole catalog should become the most valid document for the university. And if so, it will become necessary to present the information in a way that fits this medium, which if necessary, flags the new changes made since the printing of the paper version, make all the important information easily visible and provides an appropriate presentation for different (user) groups.

Concluding remarks and issues for further research

Our findings from the study show that it is cumbersome to design and apply an SGML-based document system that satisfies both the writers, organizational and technological requirements. The project group experienced conflicts seen as a challenge to be dealt with in similar projects, and which is interesting to look deeper into in further research.There is a conflict between a DTD that fulfills the requirements from a technological point of view, such as reusing information, linking it and presenting it on paper and in WWW (as stated as reasons for getting the catalog into SGML; see page 3), and the fact that the writers feel that life is more difficult working in a structured environment. In order to (partly) avoid the conflict, we propose the following: (i) Design DTDs well fit for the working environment. This requires involvement of the writers and others that use and handle the information, in the development process. Following, (ii) the development of techniques and methods for user involvement that focus on organizational-, technological-, as well as use requirements. And, (iii) the development of structured editors that take into account both the need for logical markup and appropriate layout.Further on, the effort, especially from writers, that are required to get an SGML-based system applied in an organization, is greater than the benefits they gain by using it. There is a conflict between who does the job and who gets the benefit. As Grudin (94) points out, it is a challenge (iv) to design a system where different groups of users gain benefit from using it. This should be the case both through the development process and when using the final system.At last, the wishes for dynamic information (see page 3) are hard to comply with. The reason seems to be the lack of agreement on the significance of the electronic documents. There is a conflict between applying the printed or the electronic version of the document as the valid information source. Challenges will be to (v) investigate the possibilities of using electronic documents as the main source, (vi) to increase their status, and to utilize the possibilies of dynamic documents, as well as to (vii) improve the electronic presentation of documents to fit the new medium.It is a misunderstanding to think that SGML projects are all about tools and technology. SGML projects are about information, people and the relations between them (Thompson, 95). Integration of SGML into an organisation must be taken seriously. It is necessary to take into consideration a number of issues, not only technological ones (Bogren, 95). The introduction and use of an SGML-based information system requires the organisation and the people to be willing to make changes and to put effort into changing of organisatoral routines, responsibility, training and support, as well as the development of the technological system itself. Findings in the study, show that most of the writers now see the benefit of using SGML. They are able to state new requirements and they have acquired a whole new understanding of the information. Another result is general acceptance and curiosity about using SGML at the university. This knowledge and acceptance has evolved over time in which continuing writer involvement, conceptual and operational training and user support has been crucial.

References

Alschuler, Liora (1995) ABCD ... SGML A users guide to structured information, International Thompson Computer Press.

Berry, Margaret D and Goulde, Michael A (1994). A New View of Documents. Integrated Information Management in the ´90s. In Workgroup Computing Report Vol 17, No 8.

Bogren, Lennart (1995) Introducing your authors to SGML: managing change and maximizing results, Proceedings from SGML Europe 1995, Austria, Graphic Communication Association.

Braa, K. (1995) Beyond Formal Quality of Information Systems Design, P.hD Thesis, Department of Informatics, University of Oslo.

Braa, Kristin and Sandahl, Tone (1996) Electronic document exchange systems - future challenges for IS design. Paper to be submitted to IRIS´19. April 1996.

Date, C. J. (1986) An introduction to database systems. Addison-Wesley Publishing Company.

Derose, Steven J and Durand, David G (1994) Making Hypermedia Work. Kluwer Academic Publishers, Boston, Dordrecht, London.

Goldfarb, Charles, F (1992). The SGML Handbook. Claredon Press, Oxford.

Grudin, Jonathan (1994) Eight challenges for developers, Communications of the ACM, Volume 37, Number 1, p. 93-105.

Herwijnen, Eric Van (1993) Practical SGML. Kluwer Academic Publishers.

Hughes, John and King, Val (1993) Paperwork. In Steve Benford and John Mariani (eds): Requirements and Metaphores of Shared Interaction. Part II: Shared artefacts. COMIC Deliverable 4.1, pp 153-325. Lancaster, England.

Knuth, Donald (1984): The TeX-book, Addison-Wesley, Reading, Massachusetts and Lamport.

Leslie (1986): LaTeX, user´s guide and reference manual, Addison-Wesley Publishing Company, Reading, Massachusetts.

Levy, D. M. (1994) Fixed or fluid? Document stability and new media. In proceedings of ECHT´94. ACM (pp 24-31), New York.

Maler, Eve and Andaloussi, Jeanne El (1995). Developing SGML DTDs: From text to model to markup. Prentice Hall.

Reinhardt, Andy (1994) Managing the New Documents in Byte, Vol 19, No. 8, august 1994.

Schuler, Douglas and Namioka, Aki (1993). Participatory Design. Principles and Practices. Lawrence Erlbaum Associates, Publishers. Hillsdale, New Jersey

Smith, Joan M (1992) SGML and related Standards. Document description and processing language. Ellis Horwood Limited.

Sørgaard, Pål, Sandahl, Tone Irene, Ljungberg, Fredrik (1996) Lost opportunities in word processing: problems with paragraph styles. Paper submitted to NOKOBIT´96. Abstract submitted to the 30´th HICSS, March 1996.

Thompson, Marcy (1995): Are You Ready for SGML?, Proceedings from SGML Europe 1995, Austria, Graphic Communication Association.

Travis, Brian E. and Waldt, Dale C (1995) The SGML Iimplementation Guide, Springer-Verlag.