SGML Declarations

Capacity Sets

Capacities are a rough measure of the memory required to store the result of parsing an SGML document type definition (except for the specification of ID and IDREF attributes, where the content of a document instance itself changes the value of the IDCAP and IDREFCAP capacities). The idea is to give a rough idea of the magnitude of a system's resources that will be required to process a given document. While this is a worthwhile objective in theory, in practice it is difficult to accomplish. Each SGML system uses different means to implement SGML processing. In practice, if a document exceeds one or more capacities, systems usually simply report the error and continue processing until complete or until system resources are actually exhausted.

Capacities are also used to set a baseline for conformance of SGML systems since they are used to specify a minimum which all conforming SGML systems must be able to parse. Any document which does not exceed the reference capacities (a list of capacity numbers found in the SGML standard) and conforms in all other respects must be processed by a conforming SGML system.

There are several classifications of capacities, each one designed to classify the memory requirements of a particular class of SGML object. For example, ENTCAP is a measure of the memory required to represent the fact that an entity was declared. The standard defines the number of points counted for each occurrence of an SGML object. In ENTCAP's case, this value is equal to the value of the NAMELEN quantity (which was defined in the syntax) that is the maximum length of an entity name. This value is then multiplied by the number of occurrences of the object being counted to calculate the total value for each category of capacity. In the case of ENTCAP, this is the number of entities declared. Continuing the ENTCAP example, assuming that 120 entities are defined in the DTD under consideration and that NAMELEN for the concrete syntax is 12, the value of ENTCAP for this document is 1440 (12 times 120). This total is then compared to the value indicated in the SGML declaration to assure that the value in the declaration has not been exceeded.

The complete list of capacities and their definitions is found in Figure 5 in section 13.2 of the SGML standard.

Many people (and certainly many implementers) consider capacities a nuisance which add very little to the standard. They can generate a number of error messages which can be quite confusing. While they can usually be safely ignored (at least until system errors occur indicating that memory is exhausted), you can avoid questions from novice or cautious users if you update the SGML declaration for your application to indicate the appropriate maximums.

An example of changing the CAPACITY SET follows:

CAPACITY SGMLREF TOTALCAP 80000 ELEMCAP 65000 GRPCAP 65000 ATTCAP 65000

In the above example, "CAPACITY SGMLREF" introduces this section of the SGML declaration and reminds the writer that categories which do not have a specification will take the value indicated in the reference capacity set. The specifications that follow then modify only those capacities specified.

In some cases, rather than specifying a list in the declaration itself, a formal public identifier is included which references a file which contains the capacity specifications.

Reference Capacity Set

There is a default for each of the capacities. I mentioned this set, the reference capacity set, above. In the reference capacity set, each capacity is set to 35,000. When this value is exceeded by a document, and the SGML declaration does not specify a higher value, an error message will be issued by validating systems.

As mentioned above, the reference capacity set also defines the minimum size document that any conforming SGML system must be able to parse. Since capacities are a measure of the size and complexity of a document, this requirement assures that conforming SGML systems are capable of processing non-trivial applications.

Back/Next /Contents

Wayne L. Wohler, Dept G82/025Z, Publishing Solutions Development, IBM Corporation, PO Box 1900, Boulder, Colorado 80301-9191
Internet: wohler@vnet.ibm.com
IBMMAIL: USIB29WX@IBMMAIL
Phone: 1-303-924-5943