Kuopio Technical Report on SGML - Summary
This list is a part of a report
published in Finnish as a technical report of the Department of
Computer Science and Applied Mathematics, University
of Kuopio, Finland. The aim of the report was to
give a brief overview of electronic text and its
processing by computers. The main part of the report is a section
that contains
a short description and typical features of 89 systems.
This English summary contains only that part of
the report and our aim is not to update this list later.
The following list is definitely not complete. We have collected those programs
that we knew of. The descriptions of programs are made by reading
brochures, research articles and bibliographies of SGML products.
Some of the information has been received from companies or importers (into
Finland)
of products or from users of these programs. The source material is
mentioned in the description of every program. We have also tested
as many programs as possible.
When we collected material it was difficult to decide which systems
we should consider in the report. A first obvious
distinction would be between SGML
and non-SGML systems. Because this criterion would
ignore many interesting systems and prototypes we decided
to deal with all kinds of
systems for structured documents although these are not
able to input and/or output SGML documents. After this,
our next question was what
is to consider a "program for the structured text". As the users of WWW notice
our list does not contain HTML editors or
browsers. Our criterion has been that the program must be able to
accept structure definitions made by the user. However, there is
an exception to this rule: we have accepted conversion programs
that change documents from one format to another.
The reason for this decision was that many users
are interested
to know if they are able to change their own
documents made by some text processing
system into SGML documents, or back into a document accepted by their text
processing system.
We have not subdivided the systems into any groups according to their
types or categories because any kind
of a classification is always arbitrary. Many programs could belong to
many categories; their classification is almost impossible. Thus, we
have listed the systems in alphabetical order. The
description of every program
mentions, however, one or more of those types that we have used
to give an idea
about the use of the system.
We have used the following types. For every type an explanation
will provide the meaning that we have given to it.
- text editor
Text editors are usually programs
that are only able to input, update and output an ASCII text. In our list, however,
all common text processing systems
such as WordPerfect or Word are classified also as text editors.
- structure editor
A program that is used to input and output a structured text. The structure
of the text is defined to the program before the user can input the text.
The program will check that the text is valid according to the given
structure definition. In the book of Herwignen (Practical SGML, 1994)
there is a list of 12 features
that he considers necessary for structure editors. Today's structure
editors have most of these features, unfortunately, every
program has different features.
- desktop publishing software
These programs are meant for producing page layout for paper and
electronic documents.
The common features for this software are generated lists, many
columns, graphics etc. The difference between the text processing
systems that we classify into text editors and desktop publishing
programs is very unclear because many text processing systems
have gradually shifted towards desktop publishing software.
- formatter
A program to generate a formatted form from
a text for printing.
The text is made with the use of a text editor. The text contains
formatting codes that the formatter can interpret. Editing of the text
is performed separately from formatting. A formatter will obviously in future
mean something different; already now there are programs to
produce the formatted layout from an SGML document.
In SGML documents there are no codes for formatting except SGML tags.
- text search program
The text search program usually preprocesses a text in order to improving
the search process. Text search programs
use different preprocessing methods. In addition the
preprocessing text search programs contain a query language and
a search engine. The query language is used to express the queries and
the search engine processes the query and produces an answer.
Query languages for structured text usually contain the following
expressions as their basic components:
- search "program"
a character
string "program" is searched anywhere.
- search "program" inside element "description"
a
character string "program"
is searched inside a structure element "description".
- search "program" inside "description" whose "type"-attribute
is "editor"
a character string "program" is searched inside a structure
element "description" that has an attribute whose name is "type" and
whose value is "editor".
Depending on the query language it is possible to form
more complicated queries from these basic query components. For example,
in the string wild cards or regular expressions are allowed,
elements can be demanded to be inside other elements, or the queries
can be joined by Boolean operators.
- electronic delivery tools
Electronic delivery tools allow text to be searched.
In addition formatting information can be added into the text.
The text will then be called an
electronic publication.
- database
In relational
databases the data is modelled according to relations and the operations
on the data are modelled by relational algebra or relational calculus.
The query
language is the SQL (Structured Query Language) ISO standard.
In practice there are many variations of the
SQL. Database management programs typically
can manage updates for many simultaneous
users, can control the integrity of the database, can manage the
rights for many user groups and can recover data in exceptional situations.
- text database
A database management program for text. Data models and query languages
vary. The text can be to some extent structured, documents can
be subdivided into parts.
- structured text database
This group contains very different kinds of programs whose implementation
can be based on a relational database, a text database or an object-oriented
database. The common feature for these programs is that it is possible to
add text into the database according to any kinds of structure definitions
and to query the structured text. Many such features that
are typical for relational database management systems are missing of these
programs. Basic requirements for structured text
databases are:
1) must be able to add any structure definition and valid documents
according to it, 2) queries that are typical for text search programs
must be possible, 3) different versions of elements and documents must be
possible, 4) management of simultaneous updates, and
5) Application Programmer`s
Interface (API) tools.
The viewing of documents is not responsibility of
these programs. Usually viewing and editing is made by structured editors,
that are integrated with the program using the API tools.
- document manager
Tools to facilitate management of whole documents. The queries will
return only whole documents. Search for parts of documents is not possible.
In addition to documents some classification information
(author, keywords, creation date, etc.) are filed. The documents
can be retrieved using this information.
- conversion program
Programs or language interfaces
that are used to make conversion programs from one text format
to another. Most languages are
data or event driven. Actions or character strings in the input
select a conversion rule and a process.
- parser
A program that parses the text according a grammar. Because
SGML does not involve a specified grammar but can be used to describe
texts for different kinds of grammars, SGML parsers are
actually meta parsers. They input a grammar and generate a parser according
to this grammar.
- Application Programmer`s Interface (API tool)
Mostly a set of functions or procedure libraries that are used to
allow a program to interface with other programs.
- DTD tools (structure design tools)
Programs to process (create, edit, show) document type definitions
(DTD's) for structured documents.
The representations of programs contain following features in addition
of a short description. Only relevant
and available features are mentioned for a single program.
- Contact information
The developer, seller or importer (into Finland) of the program.
- References
Articles, manuals, brochures or persons that are used as an information source.
- Price
Prices for different operating systems (in parenthesis date of the price).
- Operating System
Computer system the program operates on.
- Type
The nature of the program.
- Other programs
Other products in the same product family, those products are usually
also described in the this list.
- SGML support
The ability to process DTDs and SGML documents. "DTD in/out" means that the
program can read, write and process SGML DTDs, "SGML in/out" means that
the program can read, write and process SGML tagged text.
- Search/replace
The ability of the program to search text, replace text, search elements or
replace elements.
- Views
The ability of the user to select which elements and in which
format those elements
will be shown on screen and saved for the
future. If the system does not have this ability a predefined
view has to be used.
- Printing
How the program produces the printed output.
- Description
A short decription about other features of the program.
- Note
Some special information about the program.