The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: April 08, 2005
Data Sharing, Mediation, and Synchronization

Provisional draft of a work in progress. Currently this collection of references (mixed with questions and musings) is more of a random scratch pad than a structured document. In this early phase, sample references are being collected for a broad range of topics relating to Web architecture, including notions of resource/object identity, resource identification schemes, persistence, namespaces, naming authorities, registration authorities, name resolution, network protocols, content negotiation, bindings, resource description, metadata, addressing and linking, automated data synchronization, access control, trust management, etc.

In a second phase of development, we intend to provide analysis of the collected materials.

Many of the topics outlined and referenced here are relevant to the OASIS XRI Data Interchange Technical Committee, which has been chartered to "define a generalized, extensible service for sharing, linking, and synchronizing data over the Internet and other data networks using XML documents and XRIs (Extensible Resource Identifiers)." These topics are also the subject of ongoing research in many of the IETF Working Groups and in the deliberations of the W3C Technical Architecture Group (TAG), tasked to refine principles governing the "Architecture of the World Wide Web."


[Under construction]

Identity (Object/Resource/Asset Identity)

Question List

  • If a domain perspective on identity (dictated by application requirements) impacts the structure of an identifier, how is cross-domain data sharing impacted?
  • What are some representative (example) domain-specific views of identity?

Discussion: [TBD]

Resource Identifiers and Locators

Topic List

Discussion: Opaque names vs. names mapped to ontologies.

Uniform Resource Identifier (URI)

A Uniform Resource Identifier (URI) is "a compact string of characters for identifying an abstract or physical resource. The 'generic URI' syntax consists of a sequence of four main components, for representing hierarchical relationships within the namespace <scheme>://<authority><path>?<query> where 'scheme' is required.

Features: Uniform: Uniformity provides several benefits: it allows different types of resource identifiers to be used in the same context, even when the mechanisms used to access those resources may differ... Resource: A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., 'today's weather report for Los Angeles'), and a collection of other resources. Not all resources are network 'retrievable'; e.g., human beings, corporations, and bound books in a library can also be considered resources... Identifier: An identifier is an object that can act as a reference to something that has identity. In the case of URI, the object is a sequence of characters with a restricted syntax. Having identified a resource, a system may perform a variety of operations on the resource, as might be characterized by such words as 'access', 'update', 'replace', or 'find attributes'.."

A URI can be further classified as a locator, a name, or both. The term 'Uniform Resource Locator' (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network 'location'), rather than identifying the resource by name or by some other attribute(s) of that resource. The term 'Uniform Resource Name' (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable... The URI scheme defines the namespace of the URI, and thus may further restrict the syntax and semantics of identifiers using that scheme..." [from the August 1998 RFC]

Uniform Resource Locator (URL)

IETF RFC 1738 "specifies a Uniform Resource Locator (URL), the syntax and semantics of formalized information for location and access of resources via the Internet... Just as there are many different methods of access to resources, there are several schemes for describing the location of such resources. The generic syntax for URLs provides a framework for new schemes to be established using protocols other than those defined in this document. URLs are used to 'locate' resources, by providing an abstract identification of the resource location. Having located a resource, a system may perform a variety of operations on the resource, as might be characterized by such words as 'access', 'update', 'replace', 'find attributes'. In general, only the 'access' method needs to be specified for any URL scheme..."

Note that Uniform Resource Identifier (URI): Generic Syntax (draft-fielding-uri-rfc2396bis-03) updates RFC 1738: "rfc2396bis-03 obsoletes RFC2396, which merged "Uniform Resource Locators" (RFC 1738) and "Relative Uniform Resource Locators" (RFC 1808) in order to define a single, generic syntax for all URIs. It excludes those portions of RFC 1738 that defined the specific syntax of individual URI schemes; those portions will be updated as separate documents. The process for registration of new URI schemes is defined separately by RFC 2717.

Uniform Resource Name (URN)

"Uniform Resource Names (URNs) are intended to serve as persistent, location-independent, resource identifiers and are designed to make it easy to map other namespaces (which share the properties of URNs) into URN-space. Therefore, the URN syntax provides a means to encode character data in a form that can be sent in existing protocols, transcribed on most keyboards, etc.

All URNs have the following syntax (phrases enclosed in quotes are REQUIRED): <URN> ::= "urn:" <NID> ":" <NSS>, where <NID> is the Namespace Identifier, and <NSS> is the Namespace Specific String. The leading "urn:" sequence is case-insensitive. The Namespace ID determines the syntactic interpretation of the Namespace Specific String... RFC 1630 and RFC 1737 each presents additional considerations for URN encoding, which have implications as far as limiting syntax. On the other hand, the requirement to support existing legacy naming systems has the effect of broadening syntax. Thus, we discuss the acceptable syntax for both the Namespace Identifier and the Namespace Specific String separately..." [from IETF RFC 2141]

Internationalized Resource Identifier (IRI)

The [proposed] Internationalized Resource Identifier (IRI) protocol element is "a complement to the URI (RFC2396). An IRI is a sequence of characters from the Universal Character Set (ISO10646/Unicode). A mapping from IRIs to URIs is defined, which means that IRIs can be used instead of URIs where appropriate to identify resources..."

A URI is defined as "a sequence of characters chosen from a limited subset of the repertoire of US-ASCII characters. The characters in URIs are frequently used for representing words of natural languages. Such usage has many advantages: such URIs are easier to memorize, easier to interpret, easier to transcribe, easier to create, and easier to guess. For most languages other than English, however, the natural script uses characters other than A-Z. For many people, handling Latin characters is as difficult as handling the characters of other scripts is for people who use only the Latin alphabet. Many languages with non-Latin scripts have transcriptions to Latin letters. Such transcriptions are now often used in URIs, but they introduce additional ambiguities... IRIs are designed to be compatible with recent recommendations for new URI schemes. The compatibility is provided by specifying a well defined and deterministic mapping from the IRI character sequence to the functionally equivalent URI character sequence..."

Persistent Uniform Resource Locator (PURL)

The now-familiar Uniform Resource Locator (URL) can change at the whim of hardware reconfiguration, file system reorganization, or changes in organizational structure, leaving users stranded in 404 limbo... Document Not Found. The unpredictable mobility of Internet resources is an inconvenience at best. For librarians, it is a serious problem which compromises their service to patrons and imposes an unacceptably large burden on catalog maintenance... The general solution to this problem is the development of Uniform Resource Names, or URNs. The process of defining URNs has been underway in the Internet Engineering Task Force (IETF) for some time. OCLC is an active participant and supporter of this process. The persistence requirement of URN schemes is not a technological issue so much as an outcome of the social structures that evolve to meet a common community need. OCLC's origin is deeply rooted in precisely this shared commitment to providing reliable, long-term access to information. Standardization is necessarily slow and deliberate. Putting all the pieces in place will require consensus in the IETF, developments in the community of Web browser implementors, and deployment of new code by the community of network system managers who administer the Domain Name System (DNS) for the Internet. The concerns and problems of the library community may not be fully appreciated or adequately addressed by these groups in a timely manner. Libraries can and should provide leadership in the solution of these problems..."

"To aid in the development and acceptance of URN technology, OCLC has deployed a naming and resolution service for general Internet resources. The names, which can be thought of as Persistent URLs (PURLs), can be used both in documents and in cataloging systems. PURLs increase the probability of correct resolution and thereby reduce the burden and expense of catalog maintenance... Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client. The client can then complete the URL transaction in the normal fashion. In Web parlance, this is a standard HTTP 'redirect'..." [from the Summary document]

The PURL-Based Object Identifier (POI)

The PURL-based Object Identifier (POI) is a "simple specification for resource identifiers based on the PURL system. The use of the POI is closely related to the use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and with the OAI identifier format (oai-identifiers) used within that protocol. The POI has been developed with the following criteria in mind: (1) the use of currently deployed technologies, (2) simplicity of assignment, (3) the ability to assign POIs in a distributed environment without compromising the uniqueness of assigned identifiers, (4) the delivery of a 'resolver' service for POIs that (where possible) builds on the existing investment in OAI repositories. The primary intention of the POI is as a relatively persistent identifier for resources that are described by metadata 'items' in OAI-compliant repositories. Where this is the case, POIs are not explicitly assigned to resources - a POI exists implicitly because an OAI 'item' associated with the resource is made available in an OAI-compliant repository. However, POIs can be explicitly assigned to resources independently from the use of OAI repositories and the OAI-PMH if desired. As such, the POI can be seen as a possible mechanism for implementing cool URIs. A separate document provides some POI resolver guidelines. All POI assigners are strongly encouraged to configure the PURL system to resolve their POIs..." [from the paper 2003/05/03]

Extensible Resource Identifier (XRI)

XRIs function like URIs and have a syntax mirroring URIs: a scheme name (xri) followed by the same four optional components as a generic URI, thus: xri: authority / path ? query # fragment, where the definitions of these components are, for the most part, supersets of the equivalent components in the generic URI syntax. XRIs may be used either as indirect 'names' or direct 'locators' for resources, including other XRIs. The XRI scheme also includes syntax for distinguishing whether an XRI is intended only for identification or also for resolution.

XRI syntax extends generic URI syntax by providing support for persistent and reassignable segments, unlimited delegation of namespaces, global context symbols, cross-references (containment of other URIs or XRIs), self-references (a form of cross-reference indicating that an entire XRI is intended as a unique identifier, not for network resolution), and an internationalized character set. XRIs use Unicode for internationalization following the W3C's draft for Internationalized Resource Identifiers (IRI).


The OpenURL specification "provides a methodology for describing resources that are referenced in a networked environment as well as for describing the context of the reference. To capture this context, the Standard introduces the concept of the ContextObject and provides a framework for the cross-domain description of such contextual references based on the ContextObject concept. This Standard assumes that representations of ContextObjects describing referenced resources will be transported over networks for the purpose of requesting context -sensitive services pertaining to those referenced resources. This Standard introduces a Registry to contain properties that are fundamental to create concrete representations of ContextObjects, and methods to transport them. It also defines and registers the initial content of that Registry, as a means to bootstrap the deployment of the OpenURL Framework.

"The purpose of the [OpenURL] representation and transportation of those packages is to facilitate the delivery of services pertaining to the referenced resource. Conventional URL-based linking is inadequate in this respect, because link resolution: (1) is independent of the identity of the agent that actuates the link; (2) returns at most one resource or service, not a set of resources and services that depend on the context in which the link is provided and followed; (3) fails when linked resources move or become unavailable. The [OpenURL] approach taken by the Committee overcomes these limitations by transporting packages of contextual metadata and identifiers to networked systems, named Resolvers. Using the reference and the contextual information contained in these packages, Resolvers may deliver a number of services that pertain to the referenced resource and are appropriate within the context of use..."

Part 1 of [the draft Standard] introduces the ContextObject, an information construct containing descriptions of a resource that is referenced on the network and descriptions of resources that form the context of the reference. It also defines a framework consisting of several components named Namespaces (URI, ORI, XRI), Character Encodings, Physical Representations, Constraint Languages, ContextObject Formats, Metadata Formats, Transports, and Community Profiles. When defining an actual instantiation of the general OpenURL Framework, choices for each of these components must be detailed. The Registry is introduced to contain these details... art 2 lists the initial content of the Registry, and -- wherever necessary -- also provides detailed definitions of the registered content. The initial Registry content is provided to bootstrap the deployment of two concrete instantiations of the OpenURL Framework, one built upon a Key/Encoded-Value representation for ContextObjects, the other upon an XML representation for ContextObjects. The initial Registry also details the OpenURL, a suite of HTTP(S)-based methods to transport representations of ContextObjects..." [March 12, 2003 draft]

Open Archives Initiative (OAI) Identifier

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) "provides an application-independent interoperability framework based on metadata harvesting. There are two classes of participants in the OAI-PMH framework: (1) Data Providers administer systems that support the OAI-PMH as a means of exposing metadata; and (2) Service Providers use metadata harvested via the OAI-PMH as a basis for building value-added services..."

"The OAI identifier format is intended to provide persistent resource identifiers for items in repositories that implement OAI-PMH. This is just one possible format that may be used for identifiers within OAI-PMH. oai-identifiers are Uniform Resource Names (URNs) in the sense of RFC1737; they are resource identifiers and not resource locators (URLs). Note that here the resource is the metadata (the items) and not the underlying object or 'stuff' that the metadata describes. Correspondence between an oai-identifier and any identifier that the object described by the metadata may have is outside the scope of this specification and of the OAI-PMH..."

Extensible Repository Resource Locator (ERRoL)

"An ERRoL is a 'Cool URL' to metadata, content, and services related to registered Open Archive Initiative (OAI) repositories...

From the December 31, 2003 announcement, "Jeff Young develops Extensible Repository Resource Locators (ERRoLs) for OAI Identifiers": "OCLC Researcher Jeff Young has developed a scheme for creating 'Cool' (unchanging) URLs for metadata, content, and services related to registered Open Archive Initiative (OAI) repositories. Anyone can create or use a Cool URL to any metadata record or web resource related to supported OAI repositories. Similarly, any OAI repository can use the ERRoL service by registering a unique repository identifier with the OAI Registry at UIUC. An ERRol resolver is a system that uses a repository's unique identifier to create URLs that point to metadata and content stored in the repository. Services that accompany ERRoLs include customizable HTML views of repositories, RSS feeds of OAI content, OAI-PMH version transformations, and OAI record unwrapping (i.e., providing 'Cool' URLs to XML content in OAI repositories without the OAI wrappers). Other services are planned such as record editing, metadata crosswalks, and content resolution as well as integration of user defined services..."

Grid Resource Identifier (GRI)

The Grid-Resource Specification "defines the Grid Resource Identifier for globally naming resources and the Grid Resource Metadata document which may contain metadata information for those resources. The specification does not address the means by which such documents are populated. Furthermore, it is left to the Grid application designers to define the semantics of the metadata information in Grid Resource Metadata documents... A Grid Resource Identifier (GRI) is a URN that globally, uniquely, and everlastingly identifies a resource. A resource cannot have more than one GRI... Since the GRI is unique and everlasting, it can be stored in databases and printed in journals, in the safe knowledge that it could be used, at any time in the future, to locate any services offering that resource. In this, it fulfils a similar role to an LSID..."

Life Science Identifier (LSID)

The OMG Life Sciences Identifiers specification "addresses the need for a standardized naming schema for biological entities in the Life Sciences domains, the need for a service assigning unique identifiers complying with such naming schema, and the need for a resolving service that specifies how to retrieve the entities identified by such naming schema from repositories. The normative parts of the specification are: (1) Platform independent model expressed in the attached XML file created according XMI format rules, v1.0; (2) Platform specific model for Web Services using one of the proposed bindings (SOAP/HTTP, HTTP GET, FTP) for those who are implementing Web Services model; (3) Platform specific model for Java for those who are implementing Java model."

An initial LSID specification was produced by the Interoperable Informatics Infrastructure Consortium (I3C) Life Science Identifiers (LSID) Technical Group: "In its first year of operation, I3C delivered the Life Science Identifier Resolution System (LSID), a highly relevant open-source solution. LSID provides for scalable, secure, and migration-transparent naming of biologically significant data. It introduces a straightforward approach to identifying data resources stored in multiple, distributed data stores in a manner that overcomes the limitations of naming schemes in use today. LSID is based on a mechanism for federated data and authority identification... Early implementations are at the Protein Data Bank and in Cold Spring Harbor Laboratory's (CSHL) Distributed Annotation System (DAS). The National Human Genome Research Institute's International HapMap Project, a genetic variation mapping that will help identify genetic contributions to common diseases, also makes heavy use of LSIDs. LSID simplifies procedures and ensures uniqueness of identifiers..." [I3C FAQ document]

Syntax: "LSID Syntax uses a 5-part format: URN:LSID:Authority:Namespace:Object:[Revision-ID], where URN:LSID is a mandatory prefix, Authority is the Internet domain of the organization that assigns an LSID to a resource, Namespace constrains the scope of the object; Object is an alphanumeric describing the object; Revision-ID is an optional version of the object. Examples: '' and ''.." [Andrew Ellicott, "Welcome to the 2004 Life Science ID Symposium"]

"Established in 2001, I3C is a discovery informatics consortium, which aims to increase the probability of success in pharmaceuticals and bio-tech R&D by eliminating barriers to application interoperability, data integration and knowledge flow. Huge volumes of multi-format, multi-platform data from disparate sources create many productivity barriers; I3C was borne out of the need for informatics solutions that clear these barriers and allow the acceleration of life science discoveries and new products. I3C members (including large pharmaceuticals, IT companies, not-for-profits and academic institutions) resolutely intend for I3C's work to benefit life science participants, help the industry grow and provide broad societal benefits." [I3C web site]

"The Life Sciences Identifier (LSID) is an I3C Uniform Resource Name (URN) specification in progress. The LSID concept introduces a straightforward approach to naming and identifying data resources stored in multiple, distributed data stores in a manner that overcomes the limitations of naming schemes in use today. Almost every public, internal, or department-level data store today has its own way of naming individual data resources, making integration between different data sources a tedious, never-ending chore for informatics developers and researchers. By defining a simple, common way to identify and access biologically significant data, whether that data is stored in files, relational databases, in applications, or in internal or public data sources, LSID provides a naming standard underpinning for wide-area science and interoperability..." [from IBM LSID site]

Digital Object Identifier (DOI)

The Digital Object Identifier (DOI) is a system for identifying and exchanging intellectual property in the digital environment. [It] provides a framework for managing intellectual content, for linking customers with content suppliers, for facilitating electronic commerce, and enabling automated copyright management for all types of media. The system is managed and directed by the International DOI Foundation. Several million DOIs have been assigned by DOI Registration Agencies in the US, Australasia, and Europe. DOIs are names (characters and/or digits) assigned to objects of intellectual property (physical, digital or abstract) such as electronic journal articles, images, learning objects, ebooks, images, any kind of content. They are used to provide current information, including where they (or information about them) can be found on the Internet. Information about a digital object may change over time, including where to find it, but its DOI will not change. Using DOIs as identifiers makes managing intellectual property in a networked environment much easier and more convenient, and allows the construction of automated services and transactions for e-commerce..." [from the home page]

Global Release Identifier (GRid)

"GRid is a system to identify releases of sound recordings for electronic distribution that is capable of integration with identification systems deployed by key stakeholders from across the music industry. GRid consists of: (1) The identifier syntax for the fundamental unit of trade -- the release; (2) A metadata schema providing a minimum set of data needed to uniquely identify the release; and (3) Definitions of, and data elements for, messages that will enable electronic data interchange between trading partners and others in the value chain. To begin allocating GRids, users must apply for an Issuer Code, which uniquely identifies the individual or company that will be identifying their releases with GRids... A credit card payment of GBP 150 is due for each issuer code applied for..." [from the home page]

Resource Description and Metadata

Topic List

  • Resource Description Framework (RDF)
  • Resource Directory Description Language (RDDL)

Discussion: [TBD]

Resource Description Framework (RDF)

"The Resource Description Framework (RDF) integrates a variety of applications from library catalogs and world-wide directories to syndication and aggregation of news, software, and content to personal collections of music, photos, and events using XML as an interchange syntax. The RDF specifications provide a lightweight ontology system to support the exchange of knowledge on the Web." [W3C RDF home page]

Resource Directory Description Language (RDDL)

RDDL Version 2 draft says: "It is the consensus of the TAG that RDDL is a suitable format for use as a "Namespace Document", that is to say as a representation yielded by dereferencing a URI in use as an XML Namespace Name." Previously: "A RDDL document, called a Resource Directory, provides a package of information about some target, including: (1) human-readable descriptive material about the target, (2) a directory of individual resources related to the target, each directory entry containing descriptive material and linked to the resource in question. The targets which RDDL was designed to describe are XML Namespaces [1.0, 1.1]. Examples of 'individual related resources' include schemas, stylesheets, and executable code designed to process markup from some namespace. A Resource Directory is designed to be suitable for service as the body of an entity returned by dereferencing a URI serving as an XML Namespace name..." [RDDL Version 1 spec]

General Resources and Readings: Articles, Papers, News

  • "Architecture of the World Wide Web, First Edition." W3C Working Draft. 9-December-2003 [or later]. Edited by Ian Jacobs (W3C). Developed by W3C Technical Architecture Group (TAG), chartered "to document and build consensus around principles of Web architecture, to interpret and clarify these principles when necessary, to also resolve issues involving general Web architecture brought to the TAG, and to help coordinate cross-technology architecture developments inside and outside W3C." Document abstract: "The World Wide Web is a network -- spanning information space of resources interconnected by links. This information space is the basis of, and is shared by, a number of information systems. Within each of these systems, agents (people and software) retrieve, create, display, analyze, and reason about resources. Web architecture includes the definition of the information space in terms of identification and representation of its contents, and of the protocols that support the interaction of agents in an information system making use of the space. Web architecture is influenced by social requirements and software engineering principles, leading to design choices that constrain the behavior of systems using the Web in order to achieve desired properties of the shared information space: efficiency, scalability, and the potential for indefinite growth across languages, cultures, and media. This document reflects the three bases of Web architecture: identification, interaction, and representation."

  • [January 2004] "Identifiers and Identification Systems: An Informational Look at Policies and Roles from a Library Perspective." By by Giuseppe Vitiello (Instituto Superiore de Sanità). In D-Lib Magazine (January 2004). "... identification has become a critical element in accessing electronic publications and other intellectual artifacts. Identification is now a fundamental component of what is called the "political economics of information" according to which control of the technical means of accessing resources is just as vital (and sometimes more vital) than control of the resources themselves..."

  • [March 2003] "Identity and Interoperability in Bioinformatics." By Tim Clark (I3C Editorial Board Member). In Briefings in Bioinformatics Volume 4, Number 1 (March 2003). "... our collaboration framework must do a better job of supporting identity and interoperability. This is a consequence of our choice to distribute computational objects and services among many researchers and places, and our simultaneous need to collaborate and reason across the whole set. We must be able to name all the things (concepts, tools, services, data, conclusions, etc.) in our universe of discourse, in a way comprehensible to programs. Plus our programs must be able to compute on and talk with one another about the things named, with both reliability and stability in the face of constant change. These requirements have become more stringent with the transition now underway to post-genome bioinformatics... Bioinformatics requires more and more sophisticated programmatic interoperability at every step, because of the increasing simultaneous requirements to combine distributed information, to treat this information as foundational, and to track the science... The LSID (Life Science Identifier) approach developed by I3C [Interoperable Informatics Infrastructure Consortium] may help meet this challenge. LSID and its associated services models show how to provide federated identity and distributed resolution and services definition with much reuse of existing Internet technology... LSIDs are specialised URNs (Uniform Resource Names).

  • [January 28, 2003] "Four Uses of a URL: Name, Concept, Web Location and Document Instance." By David Booth (W3C Fellow / Hewlett-Packard). January 28, 2003.

  • [March 02, 2000] Web Design Issues. Identifiers: What is Identified?" By Tim Berners-Lee. Created 1998, last edited 2000/03/02. The jam jar label text does not (normally) read "jam jar label text" or "jam jar label" or "jam jar" but "jam".

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: