[Mirrored from document home: http://www.cni.org/pub/CIMI/www/chiolink.html.]
This document gives an overview of linking strategies for project CHIO. It is targeted at being able to address long-term needs for data representation, while being implementable for delivery using current technology: specifically SGML browsers running on top of the World Wide Web's data access protocols and standard browsers.
The following proposal is based on HyTime linking protocols, my earlier proposals for the use of Formal Public Identifiers as locators and for authority control, and on Richard Light's analysis of linking for the NMAA. HyTime provides the fundamental methods for cross-linking documents, while Formal Public Identifiers provide a standard format and associated organizational framework for the assignment of unique names to intellectual resources.
Richard Light's naming proposal makes an extremely important distinction explicit with the notions of "Handles" and "Pointers." Handles are access points into a document, and reflect a significant mention of an entity of interest in the content of the document, like the name of an artist, a mention of a show, a bilbiographic reference, or a mention of a particular work. Handles are possible access points for retrieving information, and, in some cases, may also represent possible sources for links, in that a handle represents an item about which further queries may well elicit information.
A pointer is a value that actually represents a query: "find me all handles (in some relevant domain) that match this one." Linking by retrieval in this way is frequently called "implicit" or "intensional" linking. Implicit links will be very important in the CHIO project, since uniform authority files are one of the likeliest ways for the diverse institutions involved with CIMI to facilitiate information sharing. Implicit linking will also create a very powerful environment for users and museums. Once there is a service to request information from an instititution based on subject-matter identifier, that institution's data is potentially linked by anyone else using the same conventions.
The other kind of link (not specifically distinguished in the handle/pointer analysis) is the explicit link. This is the familiar hypertext point-to-point connection that moves a user from one place in a text to another place in another text. It also depends on having unique names, but the names required to create explicit links are unique names for documents and locations within documents, rather than for the subject matter of the documents themselves. Explicit linking within pure SGML is extremely limited, allowing only point-to-point references between parts of a single document.
HyTime provides a set of notational conventions that can be applied within an SGML document to represent many explicit and implicit linking methods. While the use of HyTime requires certain structural relationships among the tags in a document that it will interpret, it does not prescribe element names, and can be integrated into any DTD. HyTime defines a set of link behaviors and structures that can be integrated into any application.
This easy integration of HyTime functionality with arbitrary DTDs is a significant advantage for CHIO, since the need for multiple DTDs is inherent to the diversity of types of information and providers involved. HyTime's method of integrating with application DTDs is to use attributes of application elements to indicate what HyTime semantics apply to those elements.
One of the most important aspects of CHIO linking is that CHIO will be most meaningful if it uses location-independent links based on Formal Public Identifiers rather than location dependent linking as provided by the Web's URLs. Location independent linking is not just an implementation issue, as the name might imply, but also a data quality issue. The information encoded in a URL has a relatviely short lifetime compared to the timespans museums deal in. While the location independent links can be converted to URLs for web publication, it would be nice if we could demonstrate operating location-independent links. At the end of this document, I will discuss the state of standards in this area, and how such a demonstration might be possible with current technology.
One good piece of recent news is that ISO is almost certain to approve
a new HyTime construct (its includion was a condition of the US vote to
approve the
proposed changes to HyTime). The new HyTime construct,
nlink,
will enable full compatibility between HTML
links, TEI links, and HyTime links. This means that there is no reason not
to use TEI's simple and elegant format for external links, which are
already supported by systems vendors and can easily be made HyTime
conformant.
HyTime makes another useful distinction between "contextual" and "independent" links. A contextual link occurs in a document and links the context where it occurs to some other information. An independent link be located near the linked data, and it may not. It may not even occur in the same document as either of the anchors that it links.
Independent links impose a heavier burden on a viewing application than contextual links do. A contextual link only needs to be interpreted at the time it is displayed, while independent links need to be saved until one of their endpoints is displayed, at which point the link must be made available to the user. Independent links have their advantages as well, since even a read-only docuument, such as a CD-ROM or a remote site, can have links added to it, with the independent links distributed separately from the document.
The TEI provides a number of elements for the construction of
hypertext structures, all of which can be used in HyTime systems as well
(with the addition of appropriate attributes). The <ptr>
and <xptr>
tags correspond to HyTime
"location forms" They are pure pointers that identify a location in a
document. <Xptr>
tag uses the TEI extended pointer
mechanism to indicate an entire chain of locations (if necessary). The
ptr tag uses SGML IDREFs to indicate locations in the current document.
The reference tags <ref>
and <xref>
represent contextual links: <xref>
via the extended
pointer mechanisms, and <ref>
using IDREFS. The
ref
tags are similar in application to the
<A>
tag in HTML, but with very different location
semantics.
Since the TEI is not part of the HyTime standard, TEI extended pointers are treated as external "notations" by HyTime, and as such, must be named with a Formal Public Identifier that identifies and describes the notation. The TEI has not yet defined these identifiers, but CHIO is free to define its own for use within the museum community, if the TEI has not done so by the time implementation starts.
The TEI <link>
element is an independent link, and
except for the names of its attributes is a legitimate HyTime independent
link, as long as the TEI ptr
and xptr
tags are
defined as HyTime Notation Locations. As an example of what is involved
in adding HyTime compliance to the TEI tags, the following example
shows declarations for xptr and link elements that would enable HyTime
processing of the link, as well as TEI processing. Note that within a
document, links would look the same as they do using TEI coding; only the
DTD needs to be modified to enable HyTime processing. TEI attributes and
element declarations have been omitted. The example shows only what
attributes have to be added to turn the TEI tags into HyTime tags as well.
<!-- declare a notation to refer to TEI extended locator syntax --> <!notation TEIloc PUBLIC "-//TEI//NOTATION TEI extended pointer syntax//EN"> <!attlist link id ID #IMPLIED HyTime NAME #FIXED "ilink" -- define this element to HyTime as an independent link -- HyNames CDATA #FIXED "linkends targets" -- let HyTime know where TEI records anchors -- -- following is the TEI attribute a HyTime engine would need to know about -- targets IDREFS #REQUIRED > <!attlist xptr id ID #IMPLIED HyTime NAME #FIXED "notloc" -- HyTime notation location -- notation NAME #FIXED TEIloc -- tell HyTime that TEI defines the notation used -- HyNames CDATA #FIXED "qdomain doc" -- let HyTime know where TEI puts reference to an external entity -- -- the doc attribute is needed by HyTime -- doc ENTITY #REQUIRED >The xptr element does not need to declare any of its attributes for determining the location designated, since the HyTime engine, once it knows that the element is a TEI pointer, will passs all the information about the element to a TEI processor, that will be responsible for resolving the pointer and returning the result back to HyTime.
The best way to make the TEI <ref>
tag
HyTime conformant will be to use the new <nlink>
architectural form. The details of how this new feature will work are not
yet fixed, but the updates to HyTime are scheduled to be voted on (and
presumably accepted) by mid-July '95, in time for CHIO to use the new
features.
It is also possible to achieve the same effect by using some fairly obscure features of HyTime in combination; the details of how to do this are explained in Making Hypermedia Work, Section 10.3.2. The method described there is correct, if not attractive, but is regarded as controversial by some HyTime users.
The question has been raised as to whether accomodating multiple DTDs and data formats will cause any problems for CHIO linking. The short answer is no, it won't. Neither the TEI nor the HyTime linking standards make any assumption about the DTD of an SGML destination document, although HyTime requires a declaration of whether the SGML declaration of destination documents might be different. Non-SGML data like graphics or other data formats can be handled using SGML's notation facility.
Even the Web's basic protocols support type information on retrieved
data, so that browsers can, for example, properly process any returned
information. Linking to finding aids, databases, and other data types is
either a special case of the multiple DTDs, or of multiple data formats,
and so should not introduce any problems in principle. It may well be
important to users that resource names give them some expectation of the
type of data to be accessed, just as file extensions do for web
cognoscenti currently. This can best be handled by adapting Richard's
suggestion of a type field in the Formal Public Identifiers used as
handles to give information about the document or data type of non-handle
Identifiers as well. For instance, an FPI like
"ISBN for NMAA::Strubel//CATALOG::Folk art in the Suburbs
"
would indicate not only a data object, but would identify its genre (and
thus, DTD) as well.
The best way to do this would be to use the URN facilities that are under dicsussion and development by the IETF. Unfortunately, at this date there is no stable agreement on either the syntactic form URNs will have, or the semantics of how they will be resolved to URL by browsers. In my opinion this area is not going to stabilize much over the next 4-6 months.
Since a standards-based approach is out, in this case, what we want is the simplest practical solution we can make to get the effect of location independent pointers on top of the WWW protocols. We can in fact do this without too much pain, on a limited basis. What we can do is allow each site to create a mapping table from FPIs to URLs. To access this table we can use HTTP's redirection feature to send back the correct URL for the location independent one. We can even redirect the request to another resolver, though there is probably a practical limit to this imposed by browsers (I'm not sure about such limits).
Since the mapping may be algorithmic in parts (e.g. all NMAA requests
might be sent to a server at the Smithsonian, if we know that there are
no local copies available), we will want a program or simple script to
execute when such a mapping is performed. To process an FPI, we will
simply need to attach a prefix /cgi-bin/resolve-FPI?
. This
will invoke a local script, with the FPI (which follows the "?") as
input. The script must then examine the FPI and return a result with the
headers and result code properly set to redirect the client to the new
location.
This method is not likely to be robust in the long term, and adds a certain amount of overhead. However, for the purposes of a demonstration project, it has a lot of promise. The creation of a simple script could improve the quality of the CHIO demonstration appreciably.
The basic conclusions for Project CHIO and linking are relatively simple:
ptr
tags should
be used only with link, and the the ref
tags used to
implement all inline links