XML Extended link

Subject: Re: XML questions

Date: 17 Jun 1997 23:52:58 GMT

From: cmsmcq@uic.edu (C M Sperberg-McQueen)

Newsgroup: comp.text.sgml


    --------------------------------------------------------------

Lars Marius Garshol (larsga@ifi.uio.no) wrote:
: I've been reading the XML linkage specification and started wondering
: about the external links described in it. As I understood the spec, 
: (which is not far, admittedly) these links will not have to reside in
: XML files that the links go to/from.

Simple links do reside in the document from which they start.  Extended
links don't -- or at least, needn't.

: This leads me to wonder
: - where will they then be expected to be "declared"?

The XML Extended link element asserts the existence of a link
connecting two or more locations (perhaps one or more locations, but
there's a dispute in hypertext theology over whether single-ended
links are imaginable or not, and if imaginable whether or not they
should be countenanced).  Unlike simple links, extended links need not
be located at any of the locations connected by the link.

They will, however, be located somewhere.  And if by "where will they
be declared?" you mean "where will the link be asserted?", then the
answer is "in an XML document containing Extended links".

If you mean "how will the browser know that a given phrase should 
function as one end of some link(s)?" then read on.

: - how will they be linked to the actual XML files? (Ie: how will the
:   browser know that this linking element corresponds to that link)

The browser can know in the same way it can know *now* that a given <a
name='foo'> element in an HTML document is the target of a link coded
<a href="#foo"> -- i.e.  by (1) displaying a document (call it D), (2)
keeping track of the elements in D, using the data structure of its
choice, (3) reading some documents (X, Y, Z, maybe also D) which
contain link elements asserting the existence of two- or n-way links),
and (4) noticing when one of the link elements in X, Y, Z points at an
element in D.  A nice browser might then indicate the presence of an
incoming or outgoing link with an icon, or a special color / font /
underscore treatment.

How does the browser find X, Y, and Z?  Two ways are fairly obvious;
there are probably others.  (1) The user says "Load document D, and
also load documents X, Y, and Z, because X contains the links
constructed by my good friend Eliot Kimber, showing the correspondence
of each paragraph in document D to one or more chapters in the
prophecies of Nostradamus, and Y contains Steve DeRose's elegant
refutation of Kimber, and Z contains Kimber's response to DeRose, and
I don't want to miss a single link in the chain of argument" (or words
to similar effect; XML-link will not actually require documents
containing external links to have been written by Steve DeRose and
Eliot Kimber).  (2) The document D can contain an element (this is
what XML-link calls the Document slement) pointing at other documents
which contain relevant links, which the browser may read to find out
if document D has any outgoing links the browser needs to know about
(and perhaps to find out about incoming links, too).  

An XML browser may or may not be required by the spec to read the
external documents named by the Document element and scan through them
for links with ends in document D -- similarly, if the Document
element points at a document which itself has a Document element, the
browser may or may not be required to (a) follow the Document links
recursively up to some maximum depth, (b) notify the user and ask for
advice, (c) follow just one (or just two) recursive links, (d) shell
out and start a game of Rogue.  Some people whose judgment I trust
assure me there is no consensus on this issue and that it will therefore
be left to the implementation to decide what to do.

:  Can anyone give an example of how this is supposed to work?

OK.  Imagine that we have before us the following electronic documents:
  
  - C (the constitution of the United States)
  - J (the Judiciary Act of 1793 or whenever it was)
  - M (the Supreme Court decision in the case of Marbury vs. Madison,
       written by Chief Justice John Marshall; quotations of C and J
       in M are connected to C and J via simple links)
  - L (a document containing a set of links which connect the places
      in M where Marshall refers to but does not quote C and J, and
      various places where C and J are (a) consistent with each other,
      or (b) in conflict with each other, together with some modern
      legal commentary)
     
M has an element of the form 

  <seealso xml-link="group">
    <xlinks xml-link="document" href="L.xml"/>
  </seealso>
 
which identifies L as a document containing relevant links.

C and J have the same thing, but also mention M, since M contains 
relevant links, too:

  <seealso xml-link="group">
    <xlinks xml-link="document" href="L.xml"/>
    <xlinks xml-link="document" href="M.xml"/>
  </seealso>
   
Scenario 1:
  1 user loads L, reads the commentary; from time to time clicks on 
    links to jump to C and J.
  
Just like HTML, more or less.

Scenario 2:
  1 user loads document C
  2 browser detects the <seealso> element in C and asks the user
    "do you want to see the links to C from L and M?  do you want to
    see the links *from* C that are defined in L and M?"
  3 user says "Yes, show me stuff from M but not stuff from L"
  4 browser reads document M and notices seventeen quotation links
    connecting quotations in M to the original passages in C
  5 browser displays a 'quoted-by' icon next to the passages in C
    quoted by M
  6 user reads, and clicks on a quoted-by icon
  7 browser loads M and shows the user where M quoted the passage in C
    that the user was just reading
  etc.
  
Scenario 3:
   
  1 user loads document J
  2 browser detects the <seealso> element in J and automatically (in the
    background) scans both L and M for links involving document J.  Since
    it is actually holding M in memory, it also notices links involving M
    and L themselves (a sort of pre-emptive caching), though the only links
    involving L are in L itself.
  3 browser displays appropriate icons (quoted-by, paraphrased-by,
    consistent-with, in-contradiction-with, referred-to, or just a 
    generic all-purpose 'something-points-at-this' icon) for each 
    link to or from J, whether there is an explicit link
    element in J or the link is asserted by a link element in L or M
  4 user reads, and clicks on a link icon 
  5 browser follows the link
  etc.

Scenario 4:
  
  1 user loads document M
  2 browser detects the <seealso> element in M and does nothing, because
    (a) the user has set the default to "Do what I tell you and nothing
    more, don't try to get clever and don't try to load external links", 
    or because (b) the implementor of the browser has decided to ignore 
    extended links and the user has no choice
  3 user reads document
  
Depending on what the conformance clause of the spec says in the end,
some of these may not be correct implementations, but for now they are
at least all imaginable.

: When reading the spec, I felt that it assumed the reader knew of some
: existing standards already, especially HyTime and TEI. Is this correct?
: Do you think it would help my understanding if I tried to understand
: HyTime (which is supposed to be difficult, no?) and/or TEI before
: approaching XML linkage again?

I don't *think* knowledge of either HyTime or TEI is required to 
understand XML-link.  A few more examples should help, but I think 
what the current draft spec is really assuming is not knowledge of 
HyTime and TEI but knowledge of how hypertext systems more advanced
than the World Wide Web have been constructed, and why.  Some examples
in the spec would probably help a lot.  But then, a spec is not 
necessarily a tutorial ...

--
-C. M. Sperberg-McQueen
 University of Illinois at Chicago
 ACH/ACL/ALLC Text Encoding Initiative
 cmsmcq@uic.edu, tei@uic.edu
 +1 (312) 413-0317, fax +1 (312) 996-6834