The resolution of public entity names (i.e. how the entity manager finds them in the file system or elsewhere) is implementation dependent. These notes, originally intended solely for my own use, describe how the public-domain parser sgmls actually implements the mapping. It should be pointed out that the approach used by sgmls does not provide a completely arbitrary mapping between entity name or formal public identifier and the file identifier; this is not a problem for most users.
Let us start with the basic question: "How does sgmls locate an external entity when a PUBLIC identifier is used?"
To that, the high-level answer is:
SGML_PATH
value is a set of one or more system
identifier patterns, separated by colons or semicolons (OS-dependent,
check your documentation). Each pattern can have literal characters
as well as special keywords of the form %X (the sgmls documentation
calls these substitution fields). E.g.
%S;N.%C;/sgml/pub/%N.%C
The low-level answer follows. N.B. in version 1.1 some of the handling of blanks, underscores, etc. has changed, and may not be as described: check the documentation for the version you are using.
Set an environment variable called SGML_PATH to a value. For the moment, let's assume you set it by saying something like this:
set SGML_PATH %S:%N.%X:%N.%V:%N.%CThe default path (at least on Unix systems) is noted below.
The SGML_PATH
environment variable governs the search SGMLS
performs for public entities. Specifically, given declarations like
these:
<!ENTITY % ISOLat1 PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//Local" > <!ENTITY % ISOLat2 PUBLIC "ISO 8879-1986//ENTITIES Added Latin 2//EN//Local" > <!ENTITY % p2idmss PUBLIC "-//TEI//ENTITIES Marked sects for TEI P2 IDs//EN" 'p2idlist.entities' >SGMLS will ask the file system for a series of files, in an order determined by the
SGML_PATH
value.
The first one found by the file system is the one used by SGMLS.
For ISOLat1:
%S
(no search, no system id given)%N.%X
searches for
%N.%V
searches for ISOLat1.LOCAL
(the device-specific form is called LOCAL)ENTITIES
)For ISOLat2, similarly, substituting ISOLat2 for ISOLat1. For the other one:
%S
searches for
p2idlist.entities (system id)%N.%X
searches for
p2idmss.ppe (no local version, only public)%N.%V
(no search, no local version specified)%N.%C
searches for p2idmss.ENTITIES
(the public text class is ENTITIES
)This is as far as I understand things at present; certainly sgmls has succeeded in picking up isolat1.local, isolat1.entities, isolat1.ppe, p2idmss.entities, and p2idlist.entities, under appropriate conditions.
In a system with tree-structured directories, of course, more of the public identifier can be used to find stuff. The default path search for the ISOLat1 would be, as I understand it:
/usr/local/lib/sgml/%O/%C/%T:%N.%X:%N.%Dor
/usr/local/lib/sgml/%O/%C/%T
-- this
gives one or the other of the following, I'm not sure about the case
mapping:
/usr/local/lib/sgml/iso_8879-1986/entities/added_latin_1
/usr/local/lib/sgml/ISO_8879-1986/entities/Added_Latin_1
%N.%X
gives ISOLat1.vpe, then
ISOLat1.ppe%N.%D
appears to give nothing (I don't think there is a data content
notation given here)For the record, the overall structure of formal public identifiers is:
pubid ::= owner '//' class ('-//')? desc '//' lang ('//' version)? owner ::= 'ISO' data | '+//' data | '-//' data class ::= CHARSET | ENTITIES | DTD | DOCUMENT | etc. desc ::= data lang ::= /* code from ISO 639 */ | 'ESC' n/n n/n version ::= data
The components of the publid identifier are picked up by different `substitution fields':
%P
%O
%I
, %R
, and %U
can be used to ensure that
a search pattern only succeeds for ISO owners, registered owners, or
unregistered owners; they expand to the empty string in the
appropriate case, and to null (failure) otherwise)%C
%T
%L
%E
%V
Various types of case folding and character substitution or character deletion are performed, which should be described in the documentation for the version of sgmls you are using (they are set at compile time, and differ in the Unix, VMS, and DOS versions, to suit the operating systems).
Still other substitution fields pick up other parts of the entity declaration:
%D
the name
tiff is the notation name.%N
%P
%S
the string
bar.doc)%X
%Y
%A