SGML is an excellent tool or technology for implementing modular, reusable information and documentation - but technology and tools alone are not enough! Unless a new methodology is added, the end result of a typical SGML project may be that the users get instant access to enormous amounts of completely useless data (i.e. information pollution).
The reason is that the basic units of information that we have been using for centuries - the chapter, the section, and the paragraph, are completely undefined in terms of what function and purpose the information has for the reader! Short and/or highly structured documents may be easy to describe in a meaningful way in a DTD - but if we are looking at typical business policies and procedures manuals they will often be just as structured as a bowl of spaghetti.
This paper will introduce the Information Mapping(reg.trade mark) method. The method provides a complete hierarchy of information types or classes that can be used to produce modular, reusable information objects - all with a precisely defined purpose and function for the information users.
The idea of on-line documentation opens a world of exciting possibilities. Some of the potential benefits are:
Easier and faster access to the information
Easier navigation between the information units
Information can be reused in different contexts
Lower distribution and maintenance costs.
As the standard of the software tools available to support on-line documentation has improved significantly during the past 2-3 years, these benefits should now be within the reach of most companies and organizations. Or this is at least a very widespread conclusion.
Based on our experience this is not the case. A majority of the companies we have been in contact with have told us, that what they have in fact achieved is instant access to piles of useless data. To use a different term, they have been confronted with information pollution.
The problem is not the software. The problem is the information or documentation used as input to the software. If spaghetti information or documentation is used, the quality of the software means very little in the final analysis of the achieved results.
Apparently, there is a need for a new method that will make it possible to produce clear, structured, modular, reusable units of information.
It is the primary purpose of this paper to present a field-tested method for the production of object oriented information.
This paper leads directly into Paul Hermans' paper "The IMAP-DTD" about an SGMLimplementation of this method.
Before the method is presented, the terms "object" and "object orientation" are discussed, as well as the reasons behind the current interest in these concepts.
The term "object oriented" has for some time now been a buzzword in the software industry, but a buzzword with a very positive connotation. As with all buzzwords, it has been misused extensively, especially by various marketing departments. Practically everything has been called object oriented at some point:
Systems Object Model - SOM (IBM)
Distributed Object Model (Lotus)
Object Linking and Embedding - OLE (Microsoft)
The basic idea behind the term was to introduce the notion of self-contained software modules that could be plugged-in, just like LEGO blocks, wherever there was a need. Being self-contained, they would work even if you did not understand how they worked. All the user, (in this connection another programmer), needed to know was what they did.
The perspectives were tremendous: Imagine having one particular software object that knew how to draw a circle, any circle, and how to do it very efficiently. This object could be used by every application that needed the circle drawing feature. Our applications would take much less disk space and be much more reliable, because once you got this object tested thoroughly, there would be no need to test it in every application.
Expanding application functionality or enhancing a particular feature would also be extremely easy - plug in an extra object or replace a single object and you would be done.
Object orientation can easily be visualized when you think of a graphical user interface, a GUI. A push-button is a clearly defined object with built-in behaviour. Because it is so easy to exemplify in connection with a GUI, the terms GUI and object oriented have almost been used as synonyms in some instances.
Lately, the term "document centric computing" has started to appear more and more often. It means that the focal point is taken away from the various software applications (word processor, spreadsheet, database etc.) and put where it belongs, i.e. on your work.
In document centric computing, the document would act as a container of objects where you can put your text objects, graphics objects, video objects or whatever you need to get the job done and the message through. The applications would be tools that you would take out of the toolbox when you needed them and would put back when you needed a different tool.
In a slightly more specific context than software marketing copy, the term "object orientation" has 4 main characteristics:
Hierarchy of object types or classes
Encapsulation means that an object will contain both data and the methods needed to manipulate the data.
Example: Assume we have a "title-object". It would contain the actual words of the title, information about the fonts used and the methods needed to create, delete, display, print and edit the title. Wherever you chose to plug-in this object, you would be able to use the built-in methods that came along with the data.
Inheritance means that once you have defined one type of object, you can define an unlimited number of derived objects (sons or daughters), and they will as a default have inherited all the characteristics of the parent object type.
Example: Starting with the "title" object type, you could easily derive a "subtitle" object type that inherited all the methods of its parent. You could then introduce a small change in the display and the print methods so that a smaller point size was used.
Hierarchy of object types or classes: Perhaps starting with "the Mother of all text-objects" an object type called "word", you could define an entire hierarchy of text objects, all sharing a number of characteristics and being separated by specific, limited differences.
Example: Users of Word for Windows will probably recognize this way of working from the way "styles" are defined in this product.
Polymorphism means that a particular method could have the same name for a lot of different object types. But it would work differently according to the current object in question. As a user, you would not be required to know about these differences.
Example: You would be able to call the print-method for the "title" and "subtitle" objects without worrying about any differences in their characteristics.
The technology connected with object orientation - and SGML is an outstanding example - is quite well defined, and applications using parts of this technology are appearing every minute.
We will all benefit from this development, but these applications are after all only our tools. It will still be entirely possible, using a highly sophisticated word processor or desktop publishing system, to produce manuals that nobody will want to use for anything.
Until now very little time has been spent discussing the data we use as input for these systems.
The Information Mapping (IMAP) method is a set of techniques and principles for: analysing the audience and the information, structuring the information according to the results of the analysis, and presenting the information in the most effective way. IMAP was developed at Harvard and Columbia Universities in 1967.
Annually, more than 10,000 users around the world receive training in using the method.
Some of the most important features of the method are:
It is research-based (cognitive sciences, human factors engineering),
It is media-independent (covers text, graphics, audio, video, etc.),
It is comprehensive,
It produces highly modular, reusable information blocks or objects
It is fully documented and replicable
It has been field tested for years
The method introduces a new unit of information, the information block.
Traditionally we have operated with words, sentences, paragraphs, sections, chapters etc. The information block is different from these units because it contains information on a specific aspect of a topic and with a specific function or purpose for the user.
Example: A typical example of a title for an information block is "Procedure for creating a new customer record".
One of the most important research findings behind the IMAP method was that most kinds of communication can be split into a limited number of information categories or types.
If the creation of feelings is the main objective of the communication, this finding is not applicable. In other words, novels, poems, love letters, political statements, and marketing copy should not be mapped.
But manuals, operating procedures, project reports, feasibility studies, business proposals, recommendations, etc. should be mapped.
This is a list of the information types identified by the researchers:
Procedure: "How you order..."
Process: "How a check is processed..."
Structure: "The buttons on your telephone explained..."
Concept: "What is a fax-machine..."
Principle: "Thou shalt not kill..."
Fact: "Maximum cruising speed is 835 knots."
Classification: "There are two types of people, those who want to categorize everybody, and those who won't".
What made this classification of information types much more important was the fact that for each type of information, it was possible to identify a limited number of effective and efficient presentation methods. Some of the presentation methods (or principles) are common to all the information types.
Example: Never allow the line-length to exceed 50-60 characters.
Others were specific to a certain type of information.
In fact, the IMAP method provides you with a complete hierarchy of information object types and classes. All are derived from the basic object type, the information block.
To summarize: The IMAP method offers you a field-tested hierarchy of information objects.
These objects are self-contained because each object type has a specific purpose to the user (the reader). This, in turn, means that they are highly reusable, expandable and easy to maintain.
These characteristics are important and desirable when we look at paper-based documentation but they are a "must" when considering on-line documentation.
The following list provides a number of typical, user-reported results of implementing the IMAP information object hierarchy:
Translation costs down 79%
Number of errors down 54%
Documentation usage up 38%
Retrieval of relevant information up 55%
Information Mapping is an extremely powerful authoring methodology in connection with both paper-based and online information.
If you combine the Information Mapping method with SGML it will be possible to transform the corporate knowledge and information base from being one of the most expensive and most under-utilized assets into being one of the most important and most effective resources.