The resolution of public entity names (i.e. how the entity manager finds them in the file system or elsewhere) is implementation dependent. These notes, originally intended solely for my own use, describe how the public-domain parser sgmls actually implements the mapping. It should be pointed out that the approach used by sgmls does not provide a completely arbitrary mapping between entity name or formal public identifier and the file identifier; this is not a problem for most users.
Let us start with the basic question: "How does sgmls locate an external entity when a PUBLIC identifier is used?"
To that, the high-level answer is:
SGML_PATHvalue is a set of one or more system identifier patterns, separated by colons or semicolons (OS-dependent, check your documentation). Each pattern can have literal characters as well as special keywords of the form %X (the sgmls documentation calls these substitution fields). E.g.
The low-level answer follows. N.B. in version 1.1 some of the handling of blanks, underscores, etc. has changed, and may not be as described: check the documentation for the version you are using.
Set an environment variable called SGML_PATH to a value. For the moment, let's assume you set it by saying something like this:
set SGML_PATH %S:%N.%X:%N.%V:%N.%CThe default path (at least on Unix systems) is noted below.
SGML_PATH environment variable governs the search SGMLS
performs for public entities. Specifically, given declarations like
<!ENTITY % ISOLat1 PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//Local" > <!ENTITY % ISOLat2 PUBLIC "ISO 8879-1986//ENTITIES Added Latin 2//EN//Local" > <!ENTITY % p2idmss PUBLIC "-//TEI//ENTITIES Marked sects for TEI P2 IDs//EN" 'p2idlist.entities' >SGMLS will ask the file system for a series of files, in an order determined by the
SGML_PATHvalue. The first one found by the file system is the one used by SGMLS. For ISOLat1:
%S(no search, no system id given)
%N.%Vsearches for ISOLat1.LOCAL (the device-specific form is called LOCAL)
For ISOLat2, similarly, substituting ISOLat2 for ISOLat1. For the other one:
%Ssearches for p2idlist.entities (system id)
%N.%Xsearches for p2idmss.ppe (no local version, only public)
%N.%V(no search, no local version specified)
%N.%Csearches for p2idmss.ENTITIES (the public text class is
This is as far as I understand things at present; certainly sgmls has succeeded in picking up isolat1.local, isolat1.entities, isolat1.ppe, p2idmss.entities, and p2idlist.entities, under appropriate conditions.
In a system with tree-structured directories, of course, more of the public identifier can be used to find stuff. The default path search for the ISOLat1 would be, as I understand it:
/usr/local/lib/sgml/%O/%C/%T-- this gives one or the other of the following, I'm not sure about the case mapping:
%N.%Xgives ISOLat1.vpe, then ISOLat1.ppe
%N.%Dappears to give nothing (I don't think there is a data content notation given here)
For the record, the overall structure of formal public identifiers is:
pubid ::= owner '//' class ('-//')? desc '//' lang ('//' version)? owner ::= 'ISO' data | '+//' data | '-//' data class ::= CHARSET | ENTITIES | DTD | DOCUMENT | etc. desc ::= data lang ::= /* code from ISO 639 */ | 'ESC' n/n n/n version ::= data
The components of the publid identifier are picked up by different `substitution fields':
%Ucan be used to ensure that a search pattern only succeeds for ISO owners, registered owners, or unregistered owners; they expand to the empty string in the appropriate case, and to null (failure) otherwise)
Various types of case folding and character substitution or character deletion are performed, which should be described in the documentation for the version of sgmls you are using (they are set at compile time, and differ in the Unix, VMS, and DOS versions, to suit the operating systems).
Still other substitution fields pick up other parts of the entity declaration:
the name tiff is the notation name.
the string bar.doc)