A Proposed Convention for Embedding Metadata in HTML.

Reported by Stuart Weibel (weibel@oclc.org)

June 2, 1996

The following proposed convention reflects the consensus of a break-out group at the W3C Distributed Indexing and Searching Workshop, May 28-29, 1996, concerning tagging of meta information in HTML. This break out group included representatives of the Dublin Core/Warwick Framework Metadata meetings, Lycos, Microsoft, WebCrawler, the IEEE metadata effort, Verity Software, and the W3C.

Attendees (alphabetically):

Nick Arnett narnett@verity.com Mic Bowman bowman@transarc.com
Eliot Christian echristi@usgs.gov Dan Connolly conolly@w3.org
Martijn Koster m.koster@webcrawler.com John Kunze jak@ckm.ucsf.edu
Carl Lagoze lagoze@cs.cornell.edu Michael Mauldin fuzzy@lycos.com
Christian Mogensen christian@vivid.com Wick Nichols wickn@microsoft.com
Timothy Niesen tmn@swl.msd.ray.com Stuart Weibel weibel@oclc.org
Andrew Wood woody@dstc.edu.au


The problem is to identify a simple means of embedding metadata within HTML documents without requiring additional tags or changes to browser software, and without unnecessarily compromising current practices for robot collection of data.

Given that it is judged undesireable for such embedded metadata to display on browser screens, any solution requires encoding information in attribute tags rather than as container element content.

The goal was to agree on a simple convention for encoding structured metadata information of a variety of types (which may or may not be registered with a central registry analogous to the Mime Type registry). It was judged that a registry may be a necessary feature of the metadata infrastructure as alternative schema are elaborated, but that deployment in the short-term could go forward without such a registry, especially in light of the proposed use of the LINK tag to link descriptions to a standard schema description as described below.


The solution agreed upon is to encode schema elements in META tags, one element per META tag, and as many META tags as are necessary. Grouping of schema elements is achieved by a prefix schema identifier associated with each schema element.

The convention agreed upon is as follows:

     < META NAME    = "schema_identifier.element_name"
            CONTENT = "string data"  >
Thus, a partial Dublin Core citation might be encoded as follows:

     < META NAME    = "DC.title"
            CONTENT = "HTML 2.0 Specification" >
     < META NAME    = "DC.author"
            CONTENT = "Tim Berners-Lee" >
     < META NAME    = "DC.author"
            CONTENT = "Dan Connolly" > 
     < META NAME    = "DC.date"
            CONTENT = "November, 1995" >
     < META NAME    = "DC.identifier"
            CONTENT = "ftp://ds.internic.net/rfc/rfc1866.txt" >
And a collection of Microsoft Word metadata might be encoded as follows:

     < META NAME    = "MSW.title"
            CONTENT = "W3C Indexing Work Shop Report" >
     < META NAME    = "MSW.author"
            CONTENT = "Wick Nichols" >
     < META NAME    = "MSW.date"
            CONTENT = "May 30, 1996" >


It is judged useful to provide a means for linking to the reference definition of a schema as well. The proposed convention for doing so is as follows:

< LINK REL = SCHEMA.schema_identifier HREF="URL" >
Thus, the reference description of one metadata scheme, the Dublin Core Metadata Element Set, would be referenced in the LINK HREF as follows:
< LINK REL = SCHEMA.dc HREF = "http://purl.org/metadata/dublin_core" >
The description of an element could be accessed by the construction of URL using the # token to identify a named anchor. Thus, the derived URL below actually links to the title element in the reference description of the Dublin Core Metadata Element Set.

This URL would correspond to the human-readable description of the title element within the document by a NAME anchor such as:

<A NAME = "title"> Title </A>

    The name of the work provided by the author or publisher.
While use of the LINK tag is not required for a given schema, when used, it will make possible retrieval of the reference definition of a given schema element, and will therefore reduce the need for a formal metadata scheme registry. Multiple LINK tags can be used so that elements derived from multiple schemas can be referenced within a single document.


To promote consistency among resource description schemas, it is suggested that the semantics for metadata elements be related to existing well-known schemas whenever feasible.