Entity Management

Abstract of SGML Open Technical Resolution 9401:1994

Paul Grosso
Chief Technical Officer
SGML Open

1994 Aug 9 (Abstract 1994 Sep 2)

Copyright 1994 SGML Open

Permission to reproduce parts or all of this information in any form is granted to SGML Open members provided that this information by itself is not sold for profit and that SGML Open is credited as the author of this information.


Summary

Two different but related issues pertaining to entity management impede interoperability of SGML documents:

While there are many important issues involved and a complete solution is a long term goal, the SGML Open membership agrees upon the enclosed simple set of conventions to address a useful subset of the complete problem. To address issue A, this resolution defines an entity catalog that maps an entity's external identifier and/or name to a file name. To address issue B, this resolution defines a simple interchange packaging scheme using an interchange catalog to associate a public identifier with each interchanged file.

Committee draft 1: 1994 January 21
Committee draft 2: 1994 February 18
Committee draft 3: 1994 March 8
Committee draft 4: 1994 April 5
Working committee draft 5: 1994 April 25
First Final Draft Technical Resolution: 1994 May 2
Final Technical Resolution: 1994 August 9
Abstract version: 1994 September 2


Introduction

In order to use a variety of SGML tools in a variety of computer environments, there are two different but related problems to solve:

The short term solution for issue A defines an entity catalog that handles the simple cases of mapping an external entity's public identifier and/or entity name to a system-dependent file name. This solution allows for a probably system-dependent but application-independent catalog. Though it does not handle all issues that a complete entity manager addresses, it simplifies use of multiple products in a great majority of cases.

While there are various interchange strategies already defined—including the SGML Document Interchange Format (SDIF) defined in ISO 9069—none are currently widely used or supported by enough readily accessible implementations. This resolution addresses issue B by defining a simple interchange packaging scheme using an interchange catalog to associate a public identifier with each interchanged file.

Issue A: a simple entity catalog format

To address the issue of multiple vendors' applications on a given system, this resolution defines a format for a probably system-dependent but application-independent entity catalog that maps external identifiers and/or entity names to file names. This catalog is used by an application's entity manager. The catalog has a standard format. Each application that uses it must provide the user with a mechanism for specifying how and when the catalog is to be accessed.

Each entry in the catalog associates a “storage object identifier” (such as a file name) with information about the external entity that appears in the SGML document. In addition to entries that associate public identifiers, a catalog entry can associate an entity name with a storage object indentifier. For example, the following are possible catalog entries:

  PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN" "iso-lat1.gml"
  PUBLIC "-//ACME DTD Writers//DTD General Report//EN" "report.dtd"
  ENTITY "graph1" "graphics\graph1.cgm"

This resolution only requires applications to handle storage object identifiers that specify file names. To avoid confusion, it is recommended (but not required) that file names be restricted to the characters including letters, digits, hyphen, period, and underscore.

This resolution addresses one additional detail about public identifiers. ISO 8879 is inconsistent about the use of hyphens and colons in ISO owner identifiers. This has lead to the propagation of both the dash and the colon in ISO owner identifiers. In the interests of interoperability, this SGML Open resolution requires that all products accept either form as a valid ISO owner identifier.

Issue B: an interchange packaging scheme

The issue of interchanging a set of files among different systems can be partially addressed by an interchange packaging scheme that includes an interchange catalog that associates external identifiers with the various files in the interchange package. This resolution requires that an external entity's declaration specifies a public identifier or the SYSTEM keyword with no system identifier (in which case the entity's name will be used to do a catalog lookup for a matching catalog entry indicated by the ENTITY keyword).

An interchange package must have exactly one file named either CATALOG or catalog which is the catalog pointing into the interchange package itself. This catalog entry file must have a mapping for all files in the interchange package. The first entry in the catalog must map into the file in the interchange package that is the document entity in which parsing begins, if any such entity exists in this interchange package. When the sending and receiving systems have compatible naming schemes, files in the destination location may be given the same names as they had on the sending system. If the receiving system is unknown or incompatible with the sending system, the sender may wish to construct an interchange package with names that are most likely to be valid on the widest variety of systems.