Section 2.4: Document Management Systems

We have decided to separate document management systems into three categories to highlight the significant differences between each type of product. Many suppliers would argue that their product falls into more than one of these arbitrary divisions, and we have tried to note that where it occurs. Even so, readers are asked to study the products in each section to determine which would best suit their needs. This category (Dynamic storage systems) is intended to cover those systems that primarily provide SGML-aware document management systems for large numbers of documents. The next category (Stand-alone static storage systems) is intended to cover those products that provide fast search and retrieval of large (very large!) databases of structured, and hence SGML- aware, information. The third category (Combined static storage and viewers) is intended to cover those products that book- reading and browsing of collections of documents, where the ability to browse and skip around the documents being read is of primary importance. No products in this category have been assessed — the task of setting up and `stocking' the systems described below with documents would have required effort and time beyond what was available for the preparation of this report.

Dynamic storage systems (editorial systems)

This category of products provide, primarily, document management facilities. Some of the products are specifically written to manage SGML structured information, others are existing information management products that have had SGML-awareness added. They are included in this group if they include their own document editing and versioning facilities or can be closely coupled with other editors.


Product:
SGML/Store
Associated Products:
Balise V2.2
Developer:
A.I.S. S.A. (France)
UK Supplier(s):
SGML Systems Engineering
Price:
n/a
Platforms:
SUN Sparc
Description:
SGMl/Store is a system designed to store and manipulate SGML document and document collections under a database format. The product is packaged as a toolkit, with C and C++ APIs and a collection of utilities. It particularly meets the needs of integrators willing to develop SGML-based structured document processing systems with severe volume and performance constraints, thus requiring the use of database technology.

SGML/Store accepts arbitrary DTDs, without need for any specific schema definition or parametrisation step. The same SGML/Store database can accept heterogeneous document collections, which are instances of multiple DTDs. Based on information extracted from the DTD, the loading module performs the decomposition of valid SGML instances into database objects.

The SGML/Store technology can be used for building SGML object servers, DSSSL-based instance transformation engines, and/or virtual shared memory systems for SGML-based workgroup editors.


Product:
ActiveServer for Unix
Associated Products:
ActiveSearch for Windows
Developer:
ActiveSystems Inc. (Ontario, Canada)
UK Supplier(s):
-
Price:
$25,000 (including 10 Windows 3.1 clients)
Platforms:
ActiveServer: Most popular UNIX operating systems
ActiveSearch: Windows 3.1 + winsock.dll v1.1
Description:
ActiveServer is an object database designed specifically to manage SGML documents. ActiveServer can directly support a wide range of types of field, and it can also support SGML elements and attributes as objects organised in a hierarchical fashion. This approach is intended to minimise index overhead and document access times. Portions or complete documents can be secured on a user or group basis. Functions are available to provide update and delete privileges. Documents can also be placed `on reserve', preventing other users from changing them. ActiveServer has its own Object Query Language. OQL provides complete read/write control over documents, contexts, elements, and attributes in real time. Commands are available to retrieve, update, insert and control access to these components. ActiveServer supports real- time indexing, and permits valueword, stopword, keyword, value and bit indexing. Stop lists can be modified. ActiveServer supports the storage of both non-SGML and SGML data including text, images, sound and video.

ActiveSearch offers both unstructured and structured retrieval support and provides a wide range of search options. It recognises that documents are not just strings of characters, but include structural components. This approach reduces the amount of data required to specify a search and accelerates the search process because the search can be limited to specific areas of a document. Access to both structured and un-structured information is simplified by using a point-and-click approach to navigate through the document.

Applications can be launched from ActiveSearch which allows viewing, editing, publishing, and conversion of SGML documents stored in ActiveServer. ActiveSearch supports inline display of TIFF, PCX, and BMP graphics. Other formats including tables and equations, are supported by launching third-party viewers. A separate language provides control over document layout and format. Hyperlinks, inline and external graphic files, and query screens can also be defined. ActiveSearch utilises `marked sections' to enable creation and searching of multiple versions from a single master document.

ActiveServer and ActiveSearch are new products (announced April 1994).


Product:
DynaBase
Associated Products:
DynaText, DynaTag
Developer:
Electronic Book Technologies, Inc. (USA)
UK Supplier(s):
Price:
Platforms:
Unix, Windows NT (server/client), and Windows (client)
Description:
DynaBase is a native SGML repository designed to support online publishing, document management, revision control, and collaborative authoring of native SGML documents and related non-SGML information such as proprietary word processing files, graphics, and multi-media objects used in the production of DynaText electronic books.

DynaBase stores SGML documents (any DTD) in fully-indexed form and tracks changes across different revisions of a document down to the element level. Dynabase can also incrementally assemble DynaText electronic books that highlight changes between versions to facilitate the peer review/publishing cycle.

The DynaBase API enables integration with a wide variety of authoring, conversion, and workflow management tools. It is implemented on top of ObjectStore (an OODBMS from Object Design, Inc.).

DynaBase is a new product (announced April 1994). No deliveries had been made by Sept 1994.


Product:
BASISplus SGML Server
Associated Products:
Developer:
Information Dimensions, Inc. (USA)
UK Supplier(s):
Information Dimensions
Price:
Depends on size of server and number of clients
Platforms:
DEC (VMS), Most Unix
Description:
IDI's SGML server is an open storage manager for storing, retrieving, and updating SGML document components as separate objects. It is designed to integrate with other SGML tools (such as SGML editors, viewers and output devices) to form a complete SGML publishing solution. Adding the SGML server dramatically reduces data redundancy and increases productivity during the document creation and revision process.

The SGML server allows users to search, extract, and update SGML document components as separate objects. The ability to narrow searches to specific SGML objects provides much faster access to key pieces of information and the ability to easily reuse information that already exists. It also provides a highly- efficient means of distributing document content and keeping it up to date.

SGML server implements the HyperDoc database model, a new record type that recognises the hierarchical relationships between document components. The ability to capture distinctions and associations between SGML objects allows users to navigate and search documents very quickly and to retrieve individual document components with a high degree of precision. The HyperDoc class hierarchy includes document, component (both text and non- text), attributes, virtual tables of content (vtoc), vtoc entries, and link objects.

Combined static storage and viewers

The primary aim of this category of products is to provide viewing and browsing of collections of documents using embedded links between and within the documents. (Also see section 2.5: Stand-alone browsers and viewers.)


Product:
SoftQuad Explorer
Associated Products:
Author/Editor
Developer:
SoftQuad Inc. (Canada)
UK Supplier(s):
Price:
Explorer: $9,990, Run-time browser: ("aggressively priced")
Platforms:
MS-Windows Q2 1994, Mac and Sun SPARC later
Description:
SoftQuad Explorer is the commercial version of SGML Darc developed initially at the Swedish Royal Institute of Technology. Darc (or Document Archive Controller) is a database system designed to handle vast quantities of documents that are marked up in SGML. The system has support for full-text presentation and navigation of such documents. The system can also be a repository for documents in other formats but without the benefits of the SGML support.

It is most useful to view the capabilities of the system from the point of view of the users, who belong to one of five categories or 'levels':

  1. guest - can perform traditional index-based bibliographical searches and view document on-line,
  2. reader - may create views of documents which allows documents to be ordered and accessed in hierarchical structures. Views are somewhat similar to symbolic links under Unix, but are more flexible. The view mechanism is used to export a subset of the document database for document interchange.
  3. author - may add documents to the database. This uses an adaptable filing mechanism based on the SGML DTD which ensures that the documents bibliographic data can be extracted, in short, the document files itself.
  4. editor - is fundamentally the database manager. The editor creates and maintains groups and assigns privileges to new users. Multiple `editors' can exist.
  5. system administrator - is allowed unrestricted access to the system for maintenance purposes.

Facilities for viewing and browsing take full advantage of the element structure allowing hypertext linking, direct display of particular elements, hidden elements that can be viewed by clicking on an icon. Various graphic formats are included for inline display, or for display in a separate window.


Product:
Olias (On-Line Information Access System)
Associated Products:
Developer:
Hal Computer Systems
UK Supplier(s):
Price:
Developer toolkit: $7,500, browser: around $100.
Platforms:
Sun, HP, and IBM Unix
Description:
Like other SGML viewers, Olias formats electronic books on the fly to the size of a window open on the screen. It combines this dynamic viewing with full-text searching, browsing and an interface to the Internet.

Graphics can be displayed inline or in separate windows. A GIF graphics viewer is included, but by invoking other graphics viewers, Olias can display a wide variety of graphics files. Animation or video viewers can be plugged in as well. One weekness at the moment is the style sheet setup, which is done with an ASCII editor without the benefit of a graphical user interface.

Olias mixes full-text indexing with hierarchical browsing. The full-text indexing is based on technology licensed from Fulcrum. In addition to locating documents through full-text searches, users can navigate libraries in the book-list window. Libraries, indicated by icons, contain bookshelves, which contain documents. Once inside a document, the user may add personal bookmarks and annotations.

The Olias browser was originally conceived as a tool for delivering and retrieving SGML-encoded documents. However, the package can be used to access the Internet and to view and retrieve WWW documents. The user can create links from his own repository to the Web, and linking the other way is possible if the information provided controls the Web document.


Product:
DocMan
Associated Products:
SINDA, Scheduler
Developer:
STEP (Stürtz Electronic Publishing GmbH) (Germany)
UK Supplier(s):
Price:
n/a
Platforms:
Unix systems
Description:
DocMan is a document management system intended for long- lived documents with short and frequent revision cycles, with complex retrieval requirements, and high run-time security. There is a strict separation between document management (with DocMan) and document editing (another application).

Documents are presented to DocMan in a variety of formats, are checked, if necessary converted, and the SGML documents are parsed, then partitioned into the smallest freely configurable components. Finally, these components are added to the database where they can be archived, retrieved, and accessed. User profiles control the dissemination of data.

SINDA (Structured Input Into Database) provides user defined analysis and division of SGML documents, storage of documents in a relational database, and access for document retrieval. SINDA analyses and decomposes SGML documents according to user-defined criteria and fields the units in a relational database. The units can be found in later searches through key information, that on the one hand identifies the unit themselves, and on the other hand allows access to other parts of the text.