A Linking Strategy for CHIO

David G. Durand
Dynamic Diagrams
June 12, 1995

[Mirrored from document home: http://www.cni.org/pub/CIMI/www/chiolink.html.]

Introduction

This document gives an overview of linking strategies for project CHIO. It is targeted at being able to address long-term needs for data representation, while being implementable for delivery using current technology: specifically SGML browsers running on top of the World Wide Web's data access protocols and standard browsers.

The following proposal is based on HyTime linking protocols, my earlier proposals for the use of Formal Public Identifiers as locators and for authority control, and on Richard Light's analysis of linking for the NMAA. HyTime provides the fundamental methods for cross-linking documents, while Formal Public Identifiers provide a standard format and associated organizational framework for the assignment of unique names to intellectual resources.

Basic Distinctions

Richard Light's naming proposal makes an extremely important distinction explicit with the notions of "Handles" and "Pointers." Handles are access points into a document, and reflect a significant mention of an entity of interest in the content of the document, like the name of an artist, a mention of a show, a bilbiographic reference, or a mention of a particular work. Handles are possible access points for retrieving information, and, in some cases, may also represent possible sources for links, in that a handle represents an item about which further queries may well elicit information.

A pointer is a value that actually represents a query: "find me all handles (in some relevant domain) that match this one." Linking by retrieval in this way is frequently called "implicit" or "intensional" linking. Implicit links will be very important in the CHIO project, since uniform authority files are one of the likeliest ways for the diverse institutions involved with CIMI to facilitiate information sharing. Implicit linking will also create a very powerful environment for users and museums. Once there is a service to request information from an instititution based on subject-matter identifier, that institution's data is potentially linked by anyone else using the same conventions.

The other kind of link (not specifically distinguished in the handle/pointer analysis) is the explicit link. This is the familiar hypertext point-to-point connection that moves a user from one place in a text to another place in another text. It also depends on having unique names, but the names required to create explicit links are unique names for documents and locations within documents, rather than for the subject matter of the documents themselves. Explicit linking within pure SGML is extremely limited, allowing only point-to-point references between parts of a single document.

HyTime Linking

HyTime provides a set of notational conventions that can be applied within an SGML document to represent many explicit and implicit linking methods. While the use of HyTime requires certain structural relationships among the tags in a document that it will interpret, it does not prescribe element names, and can be integrated into any DTD. HyTime defines a set of link behaviors and structures that can be integrated into any application.

This easy integration of HyTime functionality with arbitrary DTDs is a significant advantage for CHIO, since the need for multiple DTDs is inherent to the diversity of types of information and providers involved. HyTime's method of integrating with application DTDs is to use attributes of application elements to indicate what HyTime semantics apply to those elements.

One of the most important aspects of CHIO linking is that CHIO will be most meaningful if it uses location-independent links based on Formal Public Identifiers rather than location dependent linking as provided by the Web's URLs. Location independent linking is not just an implementation issue, as the name might imply, but also a data quality issue. The information encoded in a URL has a relatviely short lifetime compared to the timespans museums deal in. While the location independent links can be converted to URLs for web publication, it would be nice if we could demonstrate operating location-independent links. At the end of this document, I will discuss the state of standards in this area, and how such a demonstration might be possible with current technology.

One good piece of recent news is that ISO is almost certain to approve a new HyTime construct (its includion was a condition of the US vote to approve the proposed changes to HyTime). The new HyTime construct, nlink, will enable full compatibility between HTML links, TEI links, and HyTime links. This means that there is no reason not to use TEI's simple and elegant format for external links, which are already supported by systems vendors and can easily be made HyTime conformant.

Contextual and Independent Links

HyTime makes another useful distinction between "contextual" and "independent" links. A contextual link occurs in a document and links the context where it occurs to some other information. An independent link be located near the linked data, and it may not. It may not even occur in the same document as either of the anchors that it links.

Independent links impose a heavier burden on a viewing application than contextual links do. A contextual link only needs to be interpreted at the time it is displayed, while independent links need to be saved until one of their endpoints is displayed, at which point the link must be made available to the user. Independent links have their advantages as well, since even a read-only docuument, such as a CD-ROM or a remote site, can have links added to it, with the independent links distributed separately from the document.

TEI Linking

The TEI provides a number of elements for the construction of hypertext structures, all of which can be used in HyTime systems as well (with the addition of appropriate attributes). The <ptr> and <xptr> tags correspond to HyTime "location forms" They are pure pointers that identify a location in a document. <Xptr> tag uses the TEI extended pointer mechanism to indicate an entire chain of locations (if necessary). The ptr tag uses SGML IDREFs to indicate locations in the current document.

The reference tags <ref> and <xref> represent contextual links: <xref> via the extended pointer mechanisms, and <ref> using IDREFS. The ref tags are similar in application to the <A> tag in HTML, but with very different location semantics.

Since the TEI is not part of the HyTime standard, TEI extended pointers are treated as external "notations" by HyTime, and as such, must be named with a Formal Public Identifier that identifies and describes the notation. The TEI has not yet defined these identifiers, but CHIO is free to define its own for use within the museum community, if the TEI has not done so by the time implementation starts.

The TEI <link> element is an independent link, and except for the names of its attributes is a legitimate HyTime independent link, as long as the TEI ptr and xptr tags are defined as HyTime Notation Locations. As an example of what is involved in adding HyTime compliance to the TEI tags, the following example shows declarations for xptr and link elements that would enable HyTime processing of the link, as well as TEI processing. Note that within a document, links would look the same as they do using TEI coding; only the DTD needs to be modified to enable HyTime processing. TEI attributes and element declarations have been omitted. The example shows only what attributes have to be added to turn the TEI tags into HyTime tags as well.




<!-- declare a notation to refer to TEI extended locator syntax -->

<!notation TEIloc PUBLIC "-//TEI//NOTATION TEI extended pointer syntax//EN">

<!attlist   link

            id          ID      #IMPLIED

            HyTime  NAME        #FIXED "ilink"

                    -- define this element to HyTime as an

                       independent link --

            HyNames CDATA       #FIXED "linkends targets"

                    -- let HyTime know where TEI records anchors --

            -- following is the TEI attribute a HyTime engine would

               need to know about --

            targets IDREFS  #REQUIRED

>

<!attlist   xptr

            id          ID          #IMPLIED

            HyTime      NAME        #FIXED "notloc"

                        -- HyTime notation location --

            notation    NAME        #FIXED TEIloc

                        -- tell HyTime that TEI defines the notation used --

            HyNames     CDATA       #FIXED "qdomain doc"

                        -- let HyTime know where TEI puts reference to

                           an external entity --

            -- the doc attribute is needed by HyTime --

            doc         ENTITY      #REQUIRED

>

The xptr element does not need to declare any of its attributes for determining the location designated, since the HyTime engine, once it knows that the element is a TEI pointer, will passs all the information about the element to a TEI processor, that will be responsible for resolving the pointer and returning the result back to HyTime.

The best way to make the TEI <ref> tag HyTime conformant will be to use the new <nlink> architectural form. The details of how this new feature will work are not yet fixed, but the updates to HyTime are scheduled to be voted on (and presumably accepted) by mid-July '95, in time for CHIO to use the new features.

It is also possible to achieve the same effect by using some fairly obscure features of HyTime in combination; the details of how to do this are explained in Making Hypermedia Work, Section 10.3.2. The method described there is correct, if not attractive, but is regarded as controversial by some HyTime users.

Accomodating multiple document types

The question has been raised as to whether accomodating multiple DTDs and data formats will cause any problems for CHIO linking. The short answer is no, it won't. Neither the TEI nor the HyTime linking standards make any assumption about the DTD of an SGML destination document, although HyTime requires a declaration of whether the SGML declaration of destination documents might be different. Non-SGML data like graphics or other data formats can be handled using SGML's notation facility.

Even the Web's basic protocols support type information on retrieved data, so that browsers can, for example, properly process any returned information. Linking to finding aids, databases, and other data types is either a special case of the multiple DTDs, or of multiple data formats, and so should not introduce any problems in principle. It may well be important to users that resource names give them some expectation of the type of data to be accessed, just as file extensions do for web cognoscenti currently. This can best be handled by adapting Richard's suggestion of a type field in the Formal Public Identifiers used as handles to give information about the document or data type of non-handle Identifiers as well. For instance, an FPI like "ISBN for NMAA::Strubel//CATALOG::Folk art in the Suburbs" would indicate not only a data object, but would identify its genre (and thus, DTD) as well.

Implementing location independent linking on the Web

The best way to do this would be to use the URN facilities that are under dicsussion and development by the IETF. Unfortunately, at this date there is no stable agreement on either the syntactic form URNs will have, or the semantics of how they will be resolved to URL by browsers. In my opinion this area is not going to stabilize much over the next 4-6 months.

Since a standards-based approach is out, in this case, what we want is the simplest practical solution we can make to get the effect of location independent pointers on top of the WWW protocols. We can in fact do this without too much pain, on a limited basis. What we can do is allow each site to create a mapping table from FPIs to URLs. To access this table we can use HTTP's redirection feature to send back the correct URL for the location independent one. We can even redirect the request to another resolver, though there is probably a practical limit to this imposed by browsers (I'm not sure about such limits).

Since the mapping may be algorithmic in parts (e.g. all NMAA requests might be sent to a server at the Smithsonian, if we know that there are no local copies available), we will want a program or simple script to execute when such a mapping is performed. To process an FPI, we will simply need to attach a prefix /cgi-bin/resolve-FPI?. This will invoke a local script, with the FPI (which follows the "?") as input. The script must then examine the FPI and return a result with the headers and result code properly set to redirect the client to the new location.

This method is not likely to be robust in the long term, and adds a certain amount of overhead. However, for the purposes of a demonstration project, it has a lot of promise. The creation of a simple script could improve the quality of the CHIO demonstration appreciably.

Conclusions

The basic conclusions for Project CHIO and linking are relatively simple:

Linking via the TEI ptr, ref, xptr, xref, and link tags is a good idea, but for compatibility with HyTime, the ptr tags should be used only with link, and the the ref tags used to implement all inline links
Linking across document types is not a problem. The use of external linking methods is possible and compatible for both HyTime and the TEI. Neither includes assumptions about the document type that will be found as the result of following such a pointer.
Proper implementation of location-independent addressing will not be feasible in CHIO, as the standards and implementations just do not exist. However, it may be possible to demonstrate and experiment with strategies for using FPIs to access data by the use of server-side scripting.