XCatalog proposal draft 0.1


From      owner-xml-dev@ic.ac.uk Fri Jul 10 11:31:19 1998
Date:     Fri, 10 Jul 1998 12:20:11 -0400
From:     John Cowan <cowan@locke.ccil.org>
To:       XML Dev <xml-dev@ic.ac.uk>
Subject:  XCatalog proposal draft 0.1


This is a proposal for XCatalogs, a system based on SGML/Open
catalogs (Socats) for translating public identifiers to
system identifiers in XML.

1.  Introduction

XCatalogs are Web resources (anything from local files on up)
which contain mappings from public identifiers to system identifiers,
plus references to other XCatalogs.  They come in two syntaxes:
one which is a subset of Socat syntax, and one which is an
XML document instance.

2.  Example

Here's an example XCatalog in both syntaxes, for those who learn
best from examples:

-- catalog for "-//John Cowan" public IDs --
BASE "http://www.ccil.org/~cowan/"
PUBLIC	"-//John Cowan//ConScript Unicode Registry//EN"
	"csur/"
PUBLIC	"-//John Cowan//Essentialist Explanations//(EN,X-BRITHENIG)"
	"essential.html"
PUBLIC	"-//John Cowan//Lojban"
	"http://xiron.pc.helsinki.fi/lojban/"
DELEGATE "-//John Cowan//LOC Diacritics"
	"elsie/xcatalog.soc"
CATALOG	"http://www.w3.org/xcatalog/mastercat.soc"


<XCatalog>catalog for "-//John Cowan" public IDs
    <Base HRef="http://www.ccil.org/~cowan/"/>
    <Map PublicID="-//John Cowan//ConScript Unicode Registry//EN"
	HRef="csur/"/>
    <Map PublicID="-//John Cowan//Essentialist Explanations//(EN,X-BRITHENIG)"
	HRef="essential.html"/>
    <Map PublicID="-//John Cowan//Lojban"
	HRef="http://xiron.pc.helsinki.fi/lojban/"/>
    <Delegate PublicID="-//John Cowan//LOC Diacritics"
	HRef="elsie/xcatalog.xml"/>
    <Extend Href="http://www.w3.org/xcatalog/mastercat.xml"/>
</XCatalog>


3.  Socat syntax

The BNF for the Socat syntax is:

	Document ::= Comment? WS? (Entry (WS Entry)*)? WS? Comment?

	Entry ::= Map | Delegate | Extend | Base | Other

	Map ::= "PUBLIC" WS PubidLiteral WS SystemLiteral

	Delegate ::= "DELEGATE" WS PubidLiteral WS SystemLiteral

	Extend ::= "CATALOG" WS SystemLiteral

	Base ::= "BASE" WS SystemLiteral

	Other ::= Name (WS Name)? (WS SystemLiteral)*

	WS ::= S (Comment S)*

	Comment ::= "--" ([^--])* "--"

where Name, PubidLiteral, SystemLiteral, and S are as in XML 1.0.


4. DTD

The DTD for the XML instance syntax is (where an XCatalog element
is the root):

	<!ELEMENT XCatalog ANY>
	<!ATTLIST XCatalog
		Version CDATA #FIXED "1.0">

	<!ELEMENT Map EMPTY>
	<!ATTLIST Map
		PublicID CDATA #REQUIRED
		HRef CDATA #REQUIRED>

	<!ELEMENT Delegate EMPTY>
	<!ATTLIST Delegate
		PublicID CDATA #REQUIRED
		HRef CDATA #REQUIRED>

	<!ELEMENT Extend EMPTY>
	<!ATTLIST Extend
		HRef CDATA #REQUIRED>

	<!ELEMENT Base EMPTY>
	<!ATTLIST Base
		HRef CDATA #REQUIRED>

In the XML instance syntax, any #PCDATA content is considered comment,
and any other elements that may be present are beyond the scope of
this specification. For uniformity below, the Map, Delegate, Extend,
and Base elements are referred to as "entries".


5.  Entry semantics

Map entries (which use the keyword "PUBLIC" in the Socat syntax
for backward compatibility) mean that a public identifier which
exactly matches the public-identifier attribute should be translated
into the entry's system-identifier attribute.

Delegate entries are used to delegate groups of public identifiers
to other catalogs.  Public identifiers for which the public-identifier
attribute is an exact prefix are listed in the XCatalog specified by
the system-identifier attribute

Extend entries (which use the keyword "CATALOG" in the Socat syntax
for backward compatibility) allow additional catalogs to be
specified as extensions to this catalog.  The
system-identifier attribute specifies an XCatalog.

Base entries are used in the same way as BASE elements in HTML:
to specify the base URL for any relative URLs in system-identifier
attributes.


7.  Search algorithm

The process of searching catalogs in order to translate a public
identifier into an URI is as follows.  A queue of XCatalog
URIs is maintained, which is initialized with a system-dependent
list of URIs.  A current base URL is also maintained, initially
null.

A catalog URI is dequeued and the specified XCatalog is fetched.
The current base URL is set to the base of the catalog by
removing the least significant part of the URI.
The XCatalog is then searched from beginning to end looking for a matching
Map or Delegate entry.

A matching Map entry (exact match) causes the
process to terminate, returning the system-identifier attribute
(modified if necessary by the current base URL).

A matching Delegate entry (prefix match) causes the current queue
to be cleared and the system-identifier attribute (modified if
necessary by the current base URL) entered as the
only entry; the rest of the current XCatalog is ignored.

As Catalog entries are seen, their system-identifier attributes
are appended to the catalog URI queue.  As Base entries are seen,
the current base URL is reset to the system-identifier attribute.
Comments and Others are ignored.

When an XCatalog has been completely scanned, the next XCatalog
URI is dequeued and fetched and the current base URL reset, and
the process repeated.  When no further XCatalog URIs remain in the
queue, the process fails: the public identifier cannot
be translated.

8. Open questions

Should compliance require support for both syntaxes?
I think the Socat syntax is essential for backward compatibility
with existing tools, and it is more compact (important for
huge catalogs full of Delegate entries), but the XML instance syntax
is more in the spirit of XML (and XSchema, etc.).

If both syntaxes are supported, should Delegate and Extend entries
be allowed to refer from one syntax to another, or should Socat
catalogs refer only to other Socat catalogs and ditto for XML
instance catalogs?

[Note: See more on the SGML Open (OASIS) 'CATALOG' and identifiers in the dedicated database entry: Catalogs, Formal Public Identifiers, Formal System Identifiers -rcc]



John Cowan	http://www.ccil.org/~cowan
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)