|
Why Your Document Management System Should Care About Hyperlinks
Or, why you should care about a document management system if you care
about hyperlinks
of Texcel Research, Inc.
at SGML/XML '97
on December 9, 1997
Why links
- A link is some construct to represent a relationship between two or more
things
- Historical use is straightforward association of two data items, e.g.,
a cross-reference
- Historical venue for links is hypermedia systems
- Emerging use of links
- to locate distributed objects
- to specify dependencies
- to associate metadata with data
Some common link requirements
- Links to arbitrary data and to points within that data
- Links within and across documents
- Links with multiple endpoints
- Links that carry with them some set of semantics, such as a type and behavior
- Links into data that cannot be modified
- Control over the direction of traversal of a link
- Control over what types of objects a link can point to
- Notification when a link end is invalidated or modified
- Version history of a link
- Context-sensitive links
Representing links
- SGML ID/IDREF
- HTML
- HyTime
- XML Link
Well-accepted properties of links
- Specification of the link itself usually via some combination of elements
and/or attributes (link recognition)
- Specification of how to find the endpoints (addressing)
- What the link is for (role)
- What to do when the link is activated (behavior)
- Allowed types of things the link can point to
- Allowed direction of traversal between link endpoints
- Various other metadata about the link itself: a descriptor, who created
it and when, system-specific instructions, etc.
Link lifecycle
- Typical user uses links: hypermedia systems present and traverse existing
links
- Somebody has to create and manage links: authoring and document management
systems must do interesting things with links
- Linking within constantly modified data presents some hard problems:
- Addressing that works when linked data is modified
- Ongoing validation that links are still valid
- Automated synthesis of links
Why ID/IDREF doesn't cut it
- Limited to one document
- Limited to addressing an element
- Not enough standard semantics
- Can't maintain links independently of the data
- Impossible if read-only data doesn't already have IDs
Fundamentals of HTML links
- The A tag is a link
- It has an href attribute whose value is a URL
- That's it
This is an HTML link: <a "href=http://www.texcel.no/texcel.htm">
HTML link shortcomings
- No links to spans of text and spans of content
- No links into arbitrary data types
- No links with multiple endpoints
- No links independent of data
- No control over the types of endpoints of a link
- No control over the direction of traversal of a link
Fundamentals of HyTime links
- Hypermedia/Time-based Structuring Language, defined in ISO standard 10744
- In the standard, HyTime defines architectural forms: a set of meta
element classes and attributes with standard semantics
- When you write a DTD, you make an element a link by applying an architectural
form via a HyTime attribute
- End result is that instances of the element in a document are links
- A link relates two or more link ends. Each link end is a locator to a
piece of data known as an anchor.
- A link end addresses the anchor via various mechanisms such as nameloc,
treeloc, queryloc, and dataloc.
- A contextual link gets one of its link ends from the link element's position
in the document.
- An independent link resides independently of any of its link ends.
This is a HyTime link: <clink hytime="clink" linkend="TexcelLogo">
Shortcomings of HyTime
- According to some people, it is not possible for HyTime to have shortcomings
because it can do anything
- This is its shortcoming
Fundamentals of XML links
- Any element becomes an XML Link when it has an attribute named xml-link
- The href attribute is the locator and is a URL
- Additionally standardizes the fragment id and query portions of a URL
as either ID referencing or TEI Extended Pointers (XPointer)
- Other attributes specify information about the link such as its role and
behavior
- Simple links are like HyTime contextual links; extended links are like
HyTime independent links
This is an XML link: <simple xml-link="simple" href="file:///C|/texcel/im/lib/texcel.gif">
Shortcomings of XML Links
- No links into arbitrary data types
- No control over the types of endpoints of a link
- No control over the direction of traversal of a link
Links inside an SGML document management system
- SGML document management systems typically have unique object identification
for every SGML element
- These repository identifiers (RIDs) make complex addressing unnecessary:
link resolution is simple
- Within its own domain, a system can provide efficient link storage and
manipulation
- When data goes out of this domain, links can be exported to a standard
form
Link creation
- Present candidates for link targets, e.g., via tree views, query results,
views of content
- Generate an address to a link target
- Automatically generate ID values
- Ensure links are only to allowed types
- Associate link type and other information with the link
- Update an independent link map
- Automatically create links
Link maintenance
- Integrate with authoring systems to prevent deletion of link targets
- Notify when link target contents are modified
- Notify when an address locates a different target
- Potentially recalculate addresses
- Retrieve and update link metadata
- Maintain an independent link map
- Maintain context of link applicability
- Trace link lifecycle
Link delivery
- Real-time link traversal
- Determine and export a web of linked data
- Export links in a form optimized for the delivery system
- Drive conversion and delivery processes
There's more to links than viewing
- Tangible benefits of planning for links across the entire document lifecycle
- Exploit the capabilities of your SGML document management system to support
linking
- Quality gains are certain to follow
Copyright © Texcel N.V. All rights reserved.
|