[This local archive copy mirrored from the canonical site: http://w3c1.inria.fr/International/FSIs.html; links may not have complete integrity, so use the canonical document at this URL if possible.]

FORMAL SYSTEM IDENTIFIERS

An extension to SGML and HyTime is currently being prepared that will allow a generalized model of file identification to be adopted within SGML system identifiers. Systems that identify their source files using this extension will be said to be using Formal System Indentifiers (FSIs). FSIs can be used in conjunction with Formal Public Identifiers (FPIs) in SGML catalogs.

FSIs can identify more than one source of information, and more than one file from each source. For example the following FSI shows how three langauge variants of a file could be obtained from three alternative sources:

"<URL SOIbase='http://www.myco.com/'>/en/doc1.html /fr/doc1.html /de/doc1.html
 <tar SOIbase='/usr1/doc1.tar'>doc1-en.html doc1-fr.html doc1-de.html
 <pkzip SOIbase='\pub\doc1.zip'>\en\doc1.html \fr\doc1.html \de\doc1.html"

The FSI specification includes a number of useful standardized attributes for identifying information such as the way in which record boundaries have been encoded, what transformation has been used between octets and character sets (e.g bctf=utf-8) and how the data has in the file has been compressed, encrypted or sealed.

It is interesting to speculate how the basic syntax for the definition of FSIs containing URLs could be extended to provide both character set independent identifiers for files and a query whose form was independent of the character set restrictions of URLs.

For example, adding a CDATA title attribute to the URL element would allow FSIs of the following form to be defined:

"<URL SOIbase='http://www.telecom.fr/' title='France Télécom'>
doc1.html"

If queries are recorded in a different character set than the domain name and path name, the query could either be moved into an attribute of the the FSI, e.g.:

"<URL SOIbase='http//www.telecom.fr/' query='France Télécom'>
doc1.html"

or its encoding could be identified in a separate attribute, e.g.:

"<URL SOIbase='http//www.telecom.fr/'
q-base='http://www.myco.org/search.cgi'
q-form=UTF-8>
doc1.html?France%20Télécom"


Martin Bryan,
The SGML Centre, Churchdown, Glos. GL3 2PU, UK
Phone/Fax: +44 1452 714029
WWW home page: http://www.u-net.com/~sgml/