DrLove : Document Resource Locations (on valid elements)

Rick Jelliffe
ricko@gate.sinica.edu.au
Computing Center,
Academia Sinica,
Taipei, Taiwan
1999-03-25

Overview

This is a discussion draft for Document Resource Locations for the consideration of the W3C I18N IG and other interested W3C groups. It comes out of previous work known as XCS (Extensible Character Set).

This paper, Document Resource Locations, proposes a mechanism to link XML and XHTML documents to arbitrary resources. The benefits of being able to attacth arbitrary resources to a document are well-known. The Macintosh was largely based on this idea: Macintosh files have a resource fork as well as a data fork. However, for efficiency on the WWW, bundling data and resources together is not practical; indeed linking can be regarded as the opposite of bundling.

My proposal has three parts:

  1. A processing instruction to link a document to a Document Resource Location Dictionary (DRLD);
  2. DRLD, which uses RDF and XLink to associate with resources;
  3. Specific examples of DRLD entries, which address several "problem" areas:

This proposal does not provide any mechanisms useful in the following areas:

  1. Schemas;
  2. Out-of-line markup;
  3. Stylesheets;
  4. Element structure, Information set, or Document object model;
  5. User Agent behaviour.

To my knowledge, this proposal addresses a useful need which has not been addressed by any other W3C initiative. However, this mechanism may be useful to many working groups, in particular by providing a richer toolset for them to use to handle issues which are important but peripheral to their goals, the absense of which may cause unneedful frustration or which may complicate planned specifications with extraneous detail.

1. Document Resource Links: Associating DRLDs with XML Documents

I follow James Clark's example in http://www.w3.org/TR/PR-xml-stylesheet closely. A Document Resources Link (DRL) processing instruction is defined, which links to an external document. The target is xml-DrLove. The following attributes are available:

href CDATA #REQUIRED 
version CDATA "1.0"
title CDATA #IMPLIED 
date CDATA #IMPLIED 

The href attribute links to a DRLD document. The version attribute is housekeeping for future versions, and is not required initially. The title attribute provides a simple label.

The date attribute can be used for basic and unreliable 'dirtyness' checking: to see if the last write date of a document is greater than the date in this attribute, the document is considered dirty: this can be used to key a different icon or some other processing. Each time an editing application reloads the DRLD document, it can reset the date. ISO 8601 date notation is used: yyyy-mm-dd or yyyymmdd.

This processing instruction should be placed at the start of the document, before any DOCTYPE declarations, but after stylesheet PIs if possible and definitely after the XML header, of course.For example:

<?xml-DrLove date="1999-03-25" title="Academia Sinica extra characters"
  href="http://www.ascc.net/xml/drld/drld1.xml" ?>

A document may have multiple DRLs. As a general policy, if there is a conflict in the DRLDs targetted, the DRL which is textually first in the XML document has precedence.

2. Document Resource Location Dictionary

A DRLD is an XML document which is a list of RDF assertions. A DRLD must conform to the following architecture:

<!ELEMENT drld:DRLD ( rdf:RDF)+ >
<!ATTLIST drld:DRLD  date CDATA #IMPLIED 
	title CDATA #IMPLIED >
<!ELEMENT rdf:DRF ( rdf:Description )+ >
<!ELEMENT rdf:Description ( drld:String, ( rdf:Bag | rdf:Set | li )+) >
<!ATTLIST rdf:Description about CDATA #REQUIRED >
<!ELEMENT rdf:Bag ( match, li+ ) >
<!ELEMENT rdf:Set ( match, li+ ) >
<!ELEMENT li ANY >
<!ATTLIST li parseType CDATA #FIXED "literal" > 
<!ELEMENT drld:String ( #PCDATA )>

Other elements type names may be used instead of rf:Description and li, as allowed by the RDF specification. Other attributes, notably namespace attributes are allowed.

Each RDF element asserts some equivalence between some data or property which some tool may find in a document and some other WWW resource. (Note that an alternative representation could successfully be made using XLink extended links. I do not favour one or the other.)

3. Predefined DRLD Vocabulary

3.1 Extensible Character Set (XCS)

The first example allows the association of some glyphs with a private-use area (PUA) Unicode character. A system which was aware of both DRL and XCS would read this and recognise which glyphs to use when the string "&#x6066;A". Note that A does not have to be a standard character: it could be a non-standard character such as a mathematical operator or East Asian external character (gaiji).

<drld:DRLD title="Academia Sinica extra characters">
	<rdf:RDF>
		<rdf:Description about="urn:...:xcs" BagID="c1">
			<dlrd:String>&#x6066;A</drld:String>
		</rdf:Description>
		<rdf:Description about="#c1">
			<rdf:Set>
				<xcs:Glyph css:font-family="ASfont"
					href="http://www.ascc.net/fonts/ASfont1.ttf"
					css:font-weight="bold">&#x1345</xcs:Glyph>
				<xcs:Glyph css:font-family="ASfont"
					href="http://www.ascc.net/fonts/ASfont1.ttf"
					css:font-weight="italic">&#x1346</xcs:Glyph>
				<html:img src="myc1.gif" />
			</rdf:Set>
		</rdf:Description>
	<rdf:RDF>
</drld:DRLD>

In this example, first we assert that the string "&#x6066;A" has a drld:String relation to the resource "urn:...:xcs". This is just housekeeping, to set the object string. The second rdf:Description then asserts that a particular glyph belongs to that string. Because an rdf:Set is used, there is some notion of preference to the first glyph rather than the second.

The xcs:Glyph element type is a simple example, which should be provided as a base part of DRL. But it is important to note that vendors or groups can define their own vocabularies. This allows competition, extensibility, and peaceful co-existence.

3.2 Hyphenation and other properties

To add hypenation or any other string or character properties, the following fragment can be inserted after the xcs:Glyph elements or in a seperate rdf:Description with the same about attribute:

 	<xcs:Hyphen>&#x6066;&shy;A</xcs:Hyphen>

The xcs:Hyphen element type has a scheme attribute, which may be used if some vendor-specific hypenation notation is used. Otherwise, the appropriate characters from Unicode (soft hyphen, soft newline, etc) should be used to signal the particular kind of hyphenation. The element type also has a priority attribute, which gives the desirability of line-breaking, in a space separated list, one number [0-9] for each hyphenation candidate, with 0 meaning lowest priority and 9 indicating a highly desirable break.

In the future, a lang attribute could be allowed, to indicate in which language this particular hyphenation is appropriate. A subst attribute gives the appropriate string which can be inserted at the particular point in order to perform hyphenation: for example "&#x6066;-&#x1D;A". This is overkill at the moment.

4. Comments

It can be seen from these examples, I hope, how all the various Unicode properties could be attached to strings, including PUA strings. This is also a way to specify document-specific collation sequences.

One particular user scenario I have in mind is this:

In other words, the DRLD can be used as a system to manage the movement of a character from private non-standard use to standardization.

The syntax for the urn and its proper usage is not clear and will need further work.


Copyright (C) 1999 - Permission to republish in any format granted, providing attribution is kept.