PC WEEK: RDF aims to tame the Web

[This local archive copy mirrored from the canonical location: http://www.zdnet.com/pcweek/news/1103/03rdf.html; see this official version of the document.]

RDF aims to tame the Web
Resource Description Framework spec is poised to provide better Web publishing with metadata
By Eamonn Sullivan, PC Week Labs
11.03.97

Related stories:
Turning documents into trees
Intersights: Fumbling data in the Web's end zone

Bringing some order to the wilds of the Web is the ambitious goal of the Resource Description Framework, a standard recently proposed by the World Wide Web Consortium.

RDF could replace the alphabet soup of proposed languages for the Web--which now includes the CDF (Channel Definition Format), OSD (Open Software Description) format and the PICS (Platform for Internet Content Selection). RDF also could be used as the basis for better document management software and better search engines.

Information about information

RDF is a specification for creating metadata, which is information about information. For example, metadata about a document or book might include its title, author, publication date, publisher and subject. Even the table of contents and index can be thought of as metadata.

Other, less common metadata might include pointers to reviews of that document or book, content ratings (indicating, for example, whether it is appropriate for children), and signatures or seals asserting its authenticity.

The line dividing metadata and data is a blurry one. Metadata can be data and metadata can be used to describe other metadata. Information about the publisher of a list of citations is an example of metadata about metadata.

Web resources for metadata

RDF draft specification: www.w3.org/TR/WD-rdf-syntax/

The current standard for embedding metadata in HTML: purl.oclc.org/docs/metadata/dublin_core/approach.html

Dublin Core Metadata: purl.oclc.org/metadata/dublin_core/

The W3C's metadata activity page: www.w3.org/Metadata/Overview.html

Metadata is an old concept, but applying that concept to the ever-changing Web has been a somewhat haphazard process. Each interest group has come up with its own method. PICS was created to meet the demand for content ratings, for example, and CDF was created to provide the metadata required by push applications.

For "library catalog" types of data--author, subject and so forth--the most common approach used is the Dublin Core, which uses the HTML "Meta" tag and a short group of common categories of information, or properties. ("Author" is a Dublin Core element, for example.)

RDF takes a step back in an attempt to create an approach that would work for all of those needs and more.

RDF will likely be most immediately useful for finding information on the Web. Searching only those documents created by certain authors or about certain subjects will make full-text searches more efficient.

For example, narrowing your search to pages about sports before searching for "patriots" will make it more likely you'll find information about the football team than about John Quincy Adams.

In addition, for the user, RDF could become the basis for a more customizable Web experience. An RDF-aware browser could conceivably rearrange Web sites on the fly to bring data the user considers more valuable to the surface.

Although not spelled out in the current draft, RDF can also be used as the basis for information exchange between organizations and users. Scheduling information, personal preferences and security information are all examples of data that can be more clearly communicated when accompanied with metadata.

The first draft of RDF was posted to the Web last month at www.w3.org/Metadata/RDF/. The draft is the first step in a standardization process that could take several months, culminating in a W3C recommendation.

Unlike other drafts released by the W3C, however, the current RDF draft is not yet usable. A crucial part--the method for creating RDF schemata--still has to be released publicly before developers can begin implementing RDF. The current draft is complete enough to give developers an idea of its potential and begin planning applications for it, but it's not out of the woods yet.

One danger lurking in the shadows is the competition between Netscape Communications Corp. and Microsoft Corp. Although RDF is not finished, Netscape and Microsoft are going forward with their own RDF schemes based on the contribution each made to the RDF Working Group. Netscape, for example, has already demonstrated a client, code-named Aurora, that uses RDF.

But what Microsoft calls RDF and what Netscape calls RDF are only similar and are not interoperable. Cautious developers and authors should wait for the W3C to release more complete specifications before implementing RDF.

Yet another XML-derived language

RDF, like CDF and OSD, is an application of XML (Extensible Markup Language), meaning that it was created using XML. RDF is not a replacement for XML, but it can be thought of as an intermediary step between XML and some applications.

If that sounds confusing, it is. But in practice, it means that RDF should, once fully defined, simplify the creation of metadata languages. For example, if you needed to create a metadata schema for use in the manufacture and distribution of widgets, using RDF will probably be simpler than creating a new language from scratch using XML.

To define a metadata format in RDF, developers will have to create a schema. An RDF schema is similar to a database schema, but instead of database fields and valid values for those fields, an RDF schema defines properties (such as "author" and "publication_date") and valid values for those properties (a name and date, respectively).

The lineup for RDF

In draft:

The metadata model and syntax

Still coming:

The language for writing schemata

A model for filtering and processing based on metadata (necessary for things such as rating services and push applications)

A model for querying metadata (useful for improving search engines and Web agents)

Methods for using RDF with digital signatures

A model for using RDF in place of PICS and for migrating from PICS 1.1 to RDF

Once the schema is defined, developers can begin creating RDF statements. A statement has three parts: a resource, a property and a value. The resource part of the statement would usually be a Web page or its address. "Author" is an example of a property, and "John Smith" is an example of a value for that property.

An RDF statement can be included with the resources it describes, put into a separate page or location, or actually "enclose" the resource it describes. An example of the latter is a digital signature enclosed with the identity of the signer and the assertions the signer is making about the resource. Assertions might be anything from "It's for real" to "It's clean enough for a 6-year-old."

RDF statements are grouped together with a pointer to the relevant schemata using a basic syntax that contains only 14 elements. All other elements are defined in the schema. Although RDF defines only one kind of connection between metadata elements, arbitrary complexity can be achieved by nesting elements inside one another.

RDF makes heavy use of a recently proposed extension to XML called namespaces. The extension is designed to let developers mix and match two or more document types without fear of naming conflicts if two document types give different meanings to the same tag.

In RDF, namespaces enable authors to use two or more RDF schemata. If two schemata define different meanings to the property "Date," authors distinguish between the two by using prefixes. To distinguish between the Dublin Core version of Date and a "Widget" schema's version, the author might use DC:Date and WID:Date, for example.

Send E-mail to PC Week | Copyright notice