URIs / URLs [Dan Connolly]
Date: Fri, 06 Apr 2001 08:54:05 -0500 From: Dan Connolly <connolly@w3.org> To: Pierre-Antoine CHAMPIN <champin@bat710.univ-lyon1.fr> Cc: www-rdf-interest@w3.org, www-rdf-logic@w3.org Subject: Re: URIs / URLs
Pierre-Antoine CHAMPIN wrote: [...] > HTML and PDF version available at > http://www710.univ-lyon1.fr/~champin/urls/ This document promulgates a number of myths about Web Architecture... "However, other W3C recommendations use URLs to identify namespaces [6,8,11]. Those URLs do not locate the corresponding namespaces (which are abstract things and hence not locatable with the HTTP protocol), " I don't see any justification for the claim that namespaces are disjoint from HTTP resources. One of the primary motivations for the XML namespaces recommendation is to make the Web self-describing: each document carries the identifiers of the vocabularies/namespaces it's written in; the system works best when you can use such an identifier to GET a specification/description of the vocabulary. "For example, the URI of the type property of RDF is http://www.w3.org/1999/02/22-rdf-syntax-ns#type. As a matter of fact, the property itself is not located by that URL: its description is. " Again, I see no justification for the claim that this identifier doesn't identify a property. "URLs are transient That means they may become invalid after a certain period of time [9]. " That's a fact of life in a distributed system. URNs may become invalid after a period of time too. It's true of all URIs. URIs mean what we all agree that they mean. Agreement is facilitated by a lookup service like HTTP. In practice, URIs are quite reliable: 6% linkrot according to http://www.useit.com/alertbox/980614.html and I think the numbers get better when you measure per-request rather than per-link, since popular pages are maintained more activly than average. Unless urn: URIs provide value that http: URIs do not, the won't be deployed. I think the fact that (a) urn:'s have been standardized (IETF Proposed Standard, that is) since 1995 (b) support for them is available in popular browsers and has been for several generations and yet (c) still their use is negligible speaks for itself. They don't provide any value. Naming is a social contract, and the http: contract works and the urn: contract doesn't. "In the immediate interpretation, a URL identifies the resource retrieved through it." to be precise: it identifies the resource accessed thru it. In the general case, you can't retrieve a resource, but only a representation of one. Other text that makes this error includes: "... the retrieved resource ..." Another falsehood: "Contrarily to URLs, URNs (Uniform Resource Names) are designed to persistently identify a given resource." URIs in general are designed to persistently identifiy a given resource. Especially HTTP URIs. I recommend a series of articles by the designer of URIs, HTTP, and HTML to clarify a number of these myths: World Wide Web Design Issues http://www.w3.org/DesignIssues/ esp The Web Model: Information hiding and URI syntax (Jan 98) http://www.w3.org/DesignIssues/Model The Myth of Names and Addresses http://www.w3.org/DesignIssues/NameMyth Persistent Domains- an idea for persistence of URIs(2000/10) http://www.w3.org/DesignIssues/PersistentDomains (Hmm... this one is an interesting idea, but I think freenet: might be easier to deploy.) and regarding the intent of the design of namespaces, see: cf Web Architecture: Extensible Languages W3C Note 10 Feb 1998 http://www.w3.org/TR/NOTE-webarch-extlang Dan Connolly, W3C http://www.w3.org/People/Connolly/ --------------------------------------------------------------------- Date: Fri, 6 Apr 2001 13:31:07 +0100 From: Lee Jonas <lee.jonas@cakehouse.co.uk> To: 'Pierre-Antoine CHAMPIN' <champin@bat710.univ-lyon1.fr>, www-rdf-interest@w3.org, www-rdf-logic@w3.org Subject: RE: URIs / URLs Some excellent points in here. IMHO, there is not enough appreciation of these kinds of lower-level issues and their impact discussed in the RDF interest group. My thoughts: Mailto ====== TimBL had some interesting views on mailto: in one of his musings (I can't remember which one) that equated a mailto: as a locator, and the "default" action browsers tend to adopt is to launch a mail client to send an email to that address. However, the action could quite easily be something else, such as 'finger' that person to get info on them, or retrieve their home address from your favourite address book app. The point is that "retrieval" is not an endemic aspect of URLs. URLs merely identify by location. Software agents do something with that identifier (which just happens to be retrieval of the identified resource, mostly). Resource Versions ================= In terms of resources changing, two examples you cited sprang out at me: 1) Two people with the name Champin at the same university at different points in time. 2) Different versions of a W3C Working Draft. It seems that 1) is clear cut: two different resources identified by the same URL because they occupy the same location at different points in time. This is a problem in general in terms of the transient nature of the Internet. However, 2) is not so straight forward. The W3C use URLs that identify different versions of a document as different resources (by incorporating the publish date). Yet the latest version is also identified by a URL that does not contain any distinguishing date information. The resource retrieved by this URL changes to always retrieve the "latest version". I would suggest that a new URN scheme could directly represent the notion of different resource versions. After all, versioning is a fundamental aspect of tracking the evolution of just about anything. This "version" attribute would not form part of the main identifier, and would be optional (like a port specifier in a URL). The default "version" if not specified in a URN would be the "latest". Then the mapping from URN to URL (if any) can reflect versioning directly. Perhaps WebDAV efforts have already addressed versioning issues, I don't know. It would be worth investigating! N2L & L2N service ================= >From what I understand, you propose a N2L service of replacing "urn:xxx:" with "http:", and a L2N service of providing additional metadata with resources to convey any L2N mappings. I would suggest something altogether different. It strikes me that mapping some abstract string to an address is exactly analogous to DNS. I would propose a distributed service similar to DNS+HTTP and reverse-DNS for N2L & L2N lookups: 1) Firstly, make the services processes (i.e. daemons). The extra constraints of making the URNs in documents conform to the http protocol for N2L mappings disappear (hooray!). You also don't have to specify L2N mappings within documents, avoiding unnecessary clutter. 2) Secondly, make use of DNS / reverse-DNS. Using standard internet domain names is an excellent way of both segmenting the namespace for these URNs (as your document points out), but also provides an excellent hierarchical mechanism for locating the relevant, highly distributed N2L & L2N daemons (i.e. just do a DNS lookup on a URN / URL to identify the server with the relevent N2L / L2N daemons for that domain). Anyone wanting N2L & L2N capabilities for mapping urns within their own domain space simply run these services alongside DNS. 3) Specify & implement standard N2L & L2N protocols. Once the ip address of the appropriate daemon is found, query it using some standard protocol that is yet to be specified for retrieving the N2L / L2N mappings. As a short cut, an N2L mapping could just perform the specified action on the resource referred to (if it exists). The analogies are: 1) a specific urn protocol analogous to the 'http:' protocol, but based on pure identity not location. It should also incorporate the concept of resource versions. 2) an N2L daemon analogous to an http daemon that responds to requests for actions on resources identified by the urns (this could just be an extra level of indirection, looking up a URL then 'hopping' to a http daemon on the same machine / LAN to satisfy the request). 3) an L2N daemon analogous to a reverse-DNS daemon, only at the resource level (not the IP address / domain name level). Regards Lee -----Original Message----- From: Pierre-Antoine CHAMPIN [mailto:champin@bat710.univ-lyon1.fr] Sent: 05 April 2001 12:22 To: www-rdf-interest@w3.org; www-rdf-logic@w3.org Subject: URIs / URLs Hello RDF folks, There is a recurring debate on both RDF lists about URIs, what they mean, and how some problems with RDF come from problems with them. Actually, we think there is actually a problem URIs, and especially with URLs used as URIs. Here is an attempt to clarify those problems and give some pieces of solution. Pierre-Antoine Champin HTML and PDF version available at http://www710.univ-lyon1.fr/~champin/urls/
Prepared by Robin Cover for The XML Cover Pages archive.