URIs / URLs [Dan Connolly]

Date: Fri, 06 Apr 2001 08:54:05 -0500
From: Dan Connolly <connolly@w3.org>
To: Pierre-Antoine CHAMPIN <champin@bat710.univ-lyon1.fr>
Cc: www-rdf-interest@w3.org, www-rdf-logic@w3.org
Subject: Re: URIs / URLs

Pierre-Antoine CHAMPIN wrote:
[...]
> HTML and PDF version available at
>   http://www710.univ-lyon1.fr/~champin/urls/

This document promulgates a number of myths about
Web Architecture...

  "However, other W3C recommendations use URLs to identify
  namespaces [6,8,11]. Those URLs do not locate the
  corresponding namespaces (which are abstract things and
  hence not locatable with the HTTP protocol), "

I don't see any justification for the claim
that namespaces are disjoint from HTTP resources.
One of the primary motivations for the
XML namespaces recommendation is to make the
Web self-describing: each document carries the
identifiers of the vocabularies/namespaces it's written in;
the system works best when you can use such
an identifier to GET a specification/description
of the vocabulary.

  "For example, the URI of the type property of RDF is
  http://www.w3.org/1999/02/22-rdf-syntax-ns#type. As a
  matter of fact, the property itself is not located by that
  URL: its description is. "

Again, I see no justification for the claim that this
identifier doesn't identify a property.

  "URLs are transient 

  That means they may become invalid after a certain period
  of time [9]. "

That's a fact of life in a distributed system. URNs may
become invalid after a period of time too. It's true
of all URIs. URIs mean what we all agree that they mean.
Agreement is facilitated by a lookup service like HTTP.
In practice, URIs are quite reliable:
6% linkrot according to http://www.useit.com/alertbox/980614.html
and I think the numbers get better when you measure
per-request rather than per-link,
since popular pages are maintained
more activly than average.

Unless urn: URIs provide value
that http: URIs do not, the won't be deployed.
I think the fact that

 (a) urn:'s have been standardized
 (IETF Proposed Standard, that is) since 1995
 (b) support for them is available in popular browsers
 and has been for several generations
and yet
 (c) still their use is negligible

speaks for itself. They don't provide any value.
Naming is a social contract, and the http: contract
works and the urn: contract doesn't.

  "In the immediate interpretation, a URL identifies the
  resource retrieved through it."

to be precise: it identifies the resource accessed
thru it. In the general case, you can't retrieve
a resource, but only a representation of one.
Other text that makes this error includes:

  "... the retrieved resource ..."

Another falsehood:

  "Contrarily to URLs, URNs (Uniform Resource Names) are
  designed to persistently identify a given resource."

URIs in general are designed to persistently identifiy
a given resource. Especially HTTP URIs.

I recommend a series of articles by the designer
of URIs, HTTP, and HTML to clarify a number of
these myths:

  World Wide Web Design Issues
  http://www.w3.org/DesignIssues/

esp

  The Web Model: Information hiding and URI syntax (Jan 98) 
  http://www.w3.org/DesignIssues/Model

  The Myth of Names and Addresses 
  http://www.w3.org/DesignIssues/NameMyth

  Persistent Domains- an idea for persistence of URIs(2000/10)
  http://www.w3.org/DesignIssues/PersistentDomains

  (Hmm... this one is an interesting idea, but I think freenet:
   might be easier to deploy.)

and regarding the intent of the design of namespaces, see:

  cf Web Architecture: Extensible Languages
  W3C Note 10 Feb 1998 
  http://www.w3.org/TR/NOTE-webarch-extlang

Dan Connolly, W3C http://www.w3.org/People/Connolly/

---------------------------------------------------------------------

Date: Fri, 6 Apr 2001 13:31:07 +0100 
From: Lee Jonas <lee.jonas@cakehouse.co.uk>
To: 'Pierre-Antoine CHAMPIN' <champin@bat710.univ-lyon1.fr>,
    www-rdf-interest@w3.org, www-rdf-logic@w3.org
Subject: RE: URIs / URLs

Some excellent points in here.  IMHO, there is not enough appreciation of
these kinds of lower-level issues and their impact discussed in the RDF
interest group.

My thoughts:

Mailto
======
TimBL had some interesting views on mailto: in one of his musings (I can't
remember which one) that equated a mailto: as a locator, and the "default"
action browsers tend to adopt is to launch a mail client to send an email to
that address.  However, the action could quite easily be something else,
such as 'finger' that person to get info on them, or retrieve their home
address from your favourite address book app.  

The point is that "retrieval" is not an endemic aspect of URLs.  URLs merely
identify by location.  Software agents do something with that identifier
(which just happens to be retrieval of the identified resource, mostly).

Resource Versions
=================

In terms of resources changing, two examples you cited sprang out at me:
1) Two people with the name Champin at the same university at different
points in time.
2) Different versions of a W3C Working Draft.

It seems that 1) is clear cut: two different resources identified by the
same URL because they occupy the same location at different points in time.
This is a problem in general in terms of the transient nature of the
Internet.

However, 2) is not so straight forward.  The W3C use URLs that identify
different versions of a document as different resources (by incorporating
the publish date).  Yet the latest version is also identified by a URL that
does not contain any distinguishing date information.  The resource
retrieved by this URL changes to always retrieve the "latest version".

I would suggest that a new URN scheme could directly represent the notion of
different resource versions.  After all, versioning is a fundamental aspect
of tracking the evolution of just about anything.  This "version" attribute
would not form part of the main identifier, and would be optional (like a
port specifier in a URL).  The default "version" if not specified in a URN
would be the "latest".  Then the mapping from URN to URL (if any) can
reflect versioning directly.

Perhaps WebDAV efforts have already addressed versioning issues, I don't
know.  It would be worth investigating!

N2L & L2N service
=================

>From what I understand, you propose a N2L service of replacing "urn:xxx:"
with "http:", and a L2N service of providing additional metadata with
resources to convey any L2N mappings.

I would suggest something altogether different.

It strikes me that mapping some abstract string to an address is exactly
analogous to DNS.  I would propose a distributed service similar to DNS+HTTP
and reverse-DNS for N2L & L2N lookups:

1) Firstly, make the services processes (i.e. daemons).
The extra constraints of making the URNs in documents conform to the http
protocol for N2L mappings disappear (hooray!).  You also don't have to
specify L2N mappings within documents, avoiding unnecessary clutter.

2) Secondly, make use of DNS / reverse-DNS.
Using standard internet domain names is an excellent way of both segmenting
the namespace for these URNs (as your document points out), but also
provides an excellent hierarchical mechanism for locating the relevant,
highly distributed N2L & L2N daemons (i.e. just do a DNS lookup on a URN /
URL to identify the server with the relevent N2L / L2N daemons for that
domain).  Anyone wanting N2L & L2N capabilities for mapping urns within
their own domain space simply run these services alongside DNS.

3) Specify & implement standard N2L & L2N protocols.
Once the ip address of the appropriate daemon is found, query it using some
standard protocol that is yet to be specified for retrieving the N2L / L2N
mappings.  As a short cut, an N2L mapping could just perform the specified
action on the resource referred to (if it exists).

The analogies are:
1) a specific urn protocol analogous to the 'http:' protocol, but based on
pure identity not location.  It should also incorporate the concept of
resource versions.
2) an N2L daemon analogous to an http daemon that responds to requests for
actions on resources identified by the urns (this could just be an extra
level of indirection, looking up a URL then 'hopping' to a http daemon on
the same machine / LAN to satisfy the request).
3) an L2N daemon analogous to a reverse-DNS daemon, only at the resource
level (not the IP address / domain name level).

Regards

Lee

-----Original Message-----

From: Pierre-Antoine CHAMPIN [mailto:champin@bat710.univ-lyon1.fr]
Sent: 05 April 2001 12:22
To: www-rdf-interest@w3.org; www-rdf-logic@w3.org
Subject: URIs / URLs

Hello RDF folks,

There is a recurring debate on both RDF lists about URIs, what they
mean, and how some problems with RDF come from problems with them.

Actually, we think there is actually a problem URIs, and especially with
URLs used as URIs.

Here is an attempt to clarify those problems and give some pieces of
solution.

  Pierre-Antoine Champin

HTML and PDF version available at 
  http://www710.univ-lyon1.fr/~champin/urls/

Prepared by Robin Cover for The XML Cover Pages archive.