The Warwick Metadata Workshop:

A framework for the deployment of resource description

Lorcan Dempsey
UKOLN, University of Bath, UK

Stuart L. Weibel
OCLC Office of Research, Dublin, Ohio, USA

June 30, 1996

Contents

  1. Introduction
  2. Moving the Dublin Core Forward
  3. The Warwick Framework: an Architecture for Metadata
  4. Proposals and Progress

1. Introduction

The first week of April 1996 found fifty representatives of libraries, Internet standards, text markup, and digital libraries projects converging at Warwick University to discuss advancing prospects for network resource description. The conferees came from three continents, eleven countries, and many perspectives in an effort to apply their collective experience to the clarification of issues surrounding the effective deployment of metadata for networked information resources.

The meeting was a follow-on to the previous year's OCLC/NCSA Metadata Workshop that convened a similarly diverse collection of stakeholders, and which resulted in consensus on a simple resource description record that has come to be known as the Dublin Core. The most important deliverable of that first workshop was the consensus that was achieved among the groups represented. The thirteen elements of a Dublin Core record contain few surprises, focussing largely on what might be thought of as network resource bibliography and a little bit more. [Weibel et al, 1995].

The idea received considerable attention in the year since the first meeting, but while the first workshop helped to focus discussion of the topic in many communities, the implementation of such a description record requires a formal syntax and deployment strategy that were beyond the scope of that first meeting.

Planning for the second workshop began with informal discussions between the UK Office for Library and Information Networking (UKOLN) and OCLC's Office of Research in the summer of 1995 crystallized around the theme of identifying and resolving impediments to deployment of a Dublin Core style record for resource description. The expectations of the organisers and participants were exceeded as conferees worked towards a number of related conclusions about the Dublin Core Metadata Element Set, about the need for a wider set of metadata types, and about an extensible framework for interchange of metadata of different types. A consensus about this central set of issues emerged from the workshop, and more importantly, a set of concrete proposals for moving forward has been produced. These include:

Dublin Core

Warwick Framework

Guide to Creation and Maintenance of Metadata

This paper provides a high-level overview of the issues discussed at the workshop. It brings together descriptions of the above outcomes, and places them in context. Section 2 discusses the Dublin Core and the proposals for taking it forward; Section 3 discusses the rationale for the Warwick Framework; Section 4 draws together the concrete proposals and actions which were the workshop's outcome.


2. Moving the Dublin Core forward

2.1 The Dublin Core

The Dublin Core Metadata Element Set is a set of thirteen metadata elements proposed by the first workshop as a core description record to facilitate discovery of document-like objects in a networked environment. To facilitate progress, a number of constraints were imposed on the discussion:

Table 1. The Dublin Core Elements
Subject The topic addressed by the work.
Title The name of the object.
Author The person(s) primarily responsible for the intellectual content of the object.
Publisher The agent or agency responsible for making the object available in its current form.
Other AgentThe person(s), such as editors, transcribers, and illustrators who have made other significant intellectual contributions to the work.
Date The date of publication.
Object typeThe genre of the object, such as novel, poem or dictionary.
Form The physical manifestation of the object, such as PostScript file or Windows executable file.
Identifier String or number used to uniquely identify the object.
Relation Relationship to other objects.
Source Objects, either print or electronic, from which this object is derived, if applicable.
Language Language of the intellectual content.
Coverage The spatial location and/or temporal duration characteristics of the object.

The Dublin Metadata Workshop is described in greater detail in:

[Weibel, et al. 1995] and [Weibel, 1995] .

The reference description of the element set can be found at:

http://purl.org/metadata/dublin_core_elements

2.2 Target uses for the Dublin Core

The development of the Dublin Core is motivated by several intended uses:

  1. A simple interchange format for descriptive metadata
  2. Content self-description for networked objects
  3. Semantic interoperability across domains

It is clear from early implementation experience that projects have employed Dublin Core semantics to develop simple resource description formats. The Dublin Core has suited those who need a format which is positioned between the terseness of the web crawler indexes and the fuller description of particular domain-specific formats (MARC, for example). It is full enough to support retrieval by a number of core attributes and to allow human users make judgements about the likely utility of a resource before requesting it. At the same time, it is simple enough not to require specialist expertise or extended manual effort to create.

This latter feature is especially important in the context of the second target use mentioned here. Conferees recognized the importance of richer metadata embedded in Web documents to be harvested by software robots. The use of the Dublin Core as the basis for such data is seen as a critical success factor in its adoption. The ability to embed data in other objects was also seen as essential.

Future applications will have to work with different types of metadata from different sources. The first workshop identified a need for a generic semantics which could act as the basis for semantic interoperability between multiple description schemas. The Dublin Core was positioned to provide a unifying semantics across description models. Early implementations such as the NDIS application (described below) is one example of such a use.

2.3 Early Pilot Projects

Even absent a clearly defined syntax, the Dublin Core element set attracted the interest of a number of early adopters who developed projects that built on the consensusthat emerged from the the Dublin Metadata Workshop:

2.4 Other Simple Resoource Description Models

It is important to note that there are simple resource description models other than the Dublin Core that were discussed at the Warwick Workshop. Indeed, among the factors that motivated the Warwick Framework described later in this paper is the principle that there will be a variety of resource description models that emerge from different communities, and such models should be able to coexist.

Two such models that were discussed at the workshop are described below:

2.5 Impediments to Wider Deployment

Among the major goals of the Warwick Workshop was the identification of impediments to successful deployment of a simple Internet resource description format such as the Dublin Core. Early workshop discussions identified four areas requiring substantive progress:

Specification of a Transfer Syntax

Discussions of syntax are often difficult, burdened as they are with the biases of familiarity and competing methodologies. The Dublin Workshop made progress partly because such discussions were ruled out of scope. However, consensus concerning semantics cannot be deployed without a concrete syntax (or syntaxes). In pilot implementations, the absence of a common model led to different syntax and structuring choices. Clearly, any widespread deployment of Dublin Core (or any similar description scheme) hinges on reaching consensus about a transfer syntax.

Given that the Web is the primary medium of the electronic milieu, it was further recognized that deployment of metadata in the Web is the primary strategic application; successful deployment of metadata in HTML is necessary, though almost certainly not sufficient.

A working group on syntax formed around this issue and this group has elaborated a position paper describing a formal syntax for Dublin Core Metadata. A Syntax for Dublin Core Metadata (Burnard, Miller, Quin, and Sperberg-McQueen) includes:

  1. A concrete syntax expressed as an SGML DTD
  2. A mapping of this DTD into existing HTML tags using the meta element of HTML2
  3. A proposal for 'keeping the metadata at arms length' by allowing metadata consumers recognise references to external metadata using the LINK element.

In related developments, a convention for embedding metadata in HTML was proposed in a break-out group at the W3C Distributed Indexing and Searching Workshop, May 28-29, 1996 LINK TO DIST-INDEX WORKSHOP. This break out group included representatives of the Dublin Core/Warwick Framework Metadata meetings, representatives of several major Web search vendors (Lycos, Microsoft, WebCrawler), various other software vendors, and the W3 Consortium.

The problem is to identify a simple means of embedding metadata within HTML documents without requiring additional tags or changes to browser software, and without unnecessarily compromising current practices for robot collection of data.

While metadata is intended for display in some situations, it is judged undesireable for such embedded metadata to display on browser screens as a side effect of displaying a document. Therefore, any solution requires encoding information in attribute tags rather than as container element content.

The goal was to agree on a simple convention for encoding structured metadata information of a variety of types (which may or may not be registered with a central registry analogous to the Mime Type registry). It was judged that a registry may be a necessary feature of the metadata infrastructure as alternative schema are elaborated, but that deployment in the short-term could go forward without such a registry, especially in light of the proposed use of the LINK tag to link descriptions to a standard schema description as described below.

The solution agreed upon is to encode schema elements in META tags, one element per META tag, and as many META tags as are necessary. Grouping of schema elements is achieved by a prefix schema identifier associated with each schema element.

A convention for linking resource description tags to the reference definition of the metadata schema (or schemata) used in a document was also proposed. Doing so serves as a primitive registration mechanism for metadata schemata, and lays the foundation for a more formal, machine-readable linkage mechanism in the future.

The proposed conventions are described more fully in LINK TO HTML-META CONVENTION

Development of User Guides

Resource descriptions might be created by a number of actors on the metadata use chain: authors (embedded HTML tags), site and collection administrators, third-party 'cataloguers'. Guidelines for the creation of metadata are needed. A guide for authors themselves would be especially useful in supporting a move to document-embedded descriptions, and at least one producer of HTML authoring tools (SoftQuad, Ltd.) has committed to embedding Dublin Core resource description templates in their products when the syntax and guidelines are sufficiently stable.

need elaboration on the focus of the User Guides and where/when they will be published

Extensibility -- Mixing and Matching Metadata

The Dublin Core addresses one particular niche of the metadata ecology. It is a simple resource description format that is intended to be extensible in at least two ways. As its name implies, it is intended to provide a commonly understandable core of elements that will help unify different models of resource description. Its simplicity is among its major virtues, but users may well wish to augment description of their resources with additional data.

Original concepts of extensibility for the Dublin Core assumed a mechanism for local extensions -- additional elements added at the discretion of authors or collection maintainers. Such local information may be critical to the effective use of a particular collection, though the local character of such elements may not be of general interest or usefulness.

Of perhaps greater importance is the need to link Dublin Core records to other, richer description schemes (for example, MARC, or FGDC). The ability to link a simple description record to a richer description model provides a means to promote one record type to a more complete description as warranted, and also affords a more continuous axis of resource description (from simple to complex) to suit a variety of user or system needs.

Additionally, Dublin Core data address only one slice of the metadata pie (resource description for search and retrieval). Other types of description are desired, as well... terms and conditions (who must pay what to whom, for example), archival status, administrative metadata and others.

Finally, there are competing models of resource description that overlap the Dublin Core to one degree or another. The IAFA document template is an example of one such format, USMARC another, the TEI header a third. RFC 1807 [NEED LINK] is a bibliographic description format developed by Rebecca Lasher as part of the NCSTRL [NEED LINK] project, an electronic library for technical reports in computer science.

Workshop discussions on extensibility merged with the common recognition of multiple models of description, some of which would be complementary, some of which would be overlapping, some of which would be competing. No single format for resource description would fill all the needs, nor could such a monolithic model be maintained or managed. The consensus of the workshop converged on a need for an architecture that would accomodate the diversity of models and levels of description complexity that characterize the chaotic world of electronic resources.

The proposal that emerged from these discussions is known as the Warwick Framework. It is a container architecture for the aggregation and interchange of discreet metadata packages. Such an architecture will afford the opportunity to mix and match metadata sets, allowing rational deployment of many existing and emergent description models. The following section describes the essential features of the Warwick Framework in greater detail.


3. An Architecture for Metadata: the Warwick Framework

3.1 The Need for the Warwick Framework

No single element set will satisfy all metadata requirements. Different communities of users or different application areas will require data of different elements and levels of complexity. The Workshop took as its starting point the Dublin Core, a simple scheme for what might be thought of as electronic bibliography. However, other application areas might require the fullness and structure provided by a MARC-type record, for example, or might have domain specific descriptive requirements not addressed in the Dublin Core. At the same time other types of data exist which were outside the scope of the Dublin Core: terms and conditions, evaluative data, for example.

Satisfying the need for competing, overlapping, and complementary metadata models requires an architecture that will accommodate a wide variety of seperately maintained metadata models. It was concluded that a container architecture for the interchange of metadata packages was required. A package is conceived as metadata object specialized for a particular purpose. A Dublin Core based record might be one package, a MARC record another, terms and conditions another, and so on. This architecture should be modular, to allow for differently typed metadata objects; extensible, to allow for new metadata types; distributed, to allow external metadata objects to be referenced; recursive, to allow metadata objects to be treated as 'information content' and have metadata objects associated with them.

Packages are typed objects. They may be primitive (a package is one of a number of separately defined metadata formats); indirect (a package is a reference to an external object); or a container (a container contains another container).

Several benefits flow from this approach:

The Warwick Framework is a high level container architecture: it makes no assumptions about the contents of the packages. Nor can it be assumed that clients (or agents) will be able to interpret all packages. To ensure such ability will require prior agreement. Conferees agreed that packages should be strongly typed and that a registry for metadata types will probably be required, perhaps along the same lines as the IANA registry for Internet Media Types (also known as MIME types).

The requirements for an architecture and the architecture itself are described more fully in a companion article in this issue, The Warwick Framework -- A Container Architecture for Aggregating Metadata Packages.

3.2 Impediments to Implementation

Concrete implementations

The architecture needs to be realised in one or more concrete implementations. Proposals for MIME- and SGML- based implementations have been prepared as well as a discussion of the architecture in a distributed object environment based on CORBA.

Registration

A registry agency for metadata object types needs to be established. Early implementation pilot projects should not be hampered by the lack of such an agency, but as more metadata sets are elaborated by various stakeholders, a formal means for managing changes will be important.

3.3 Moving Forward

The Warwick Framework was enthusiastically welcomed at the workshop as a practical approach to the effective integration of metadata into a global information infrastructure. The realization of such an architecture will require great effort on many fronts, in many communities. The great hope is that the consensus achieved at this meeting will have provided the foundation for coordination, and sufficient freedom in the proposed architecture to allow progress without an undue burden of close coordination.

The following working papers address aspects of the Warwick Framework more fully:


4. Moving Forward: Proposals and Actions

Conferees left Warwick convinced that significant progress had been made in important areas. This conviction is corroborated by the rapid appearance of a number of documents supporting key decisions and recommendations.

The consensus concerning embedding metadata in HTML reached at the W3C workshop on Distributed Indexing and Searching LINK provides an encouraging impetus to rapid deployment of richer resource description techniques on the Web along the lines developed in the Warwick Workshop.

The recent appearance of a Dublin Core implementation based on these developments LINK TO A.P. Miller's Archeology Project is an promising indicator of the need and demand for better resource description on the Net, and the speed with which such ideas can be promulgated with the strength of community concensus and a clear direction for development.

It is hoped that the Warwick Workshop will prove to have galvanized such a consensus and provided an important signpost for the development of more effective networked resource description.


5. References and Bibliography

  1. A Syntax for Dublin Core Metadata (Lou Burnard, Eric Miller, Liam Quin, C.M. Sperberg-McQueen)
  2. A proposal for a concrete SGML implementation of the Warwick Framework.
  3. On Information factoring in Dublin metadata records (C.M. Sperberg-McQueen)
  4. The Warwick Framework: a container architecture for aggregating metadata objects(Carl Lagoze and other, CLifford Lynch, and Ron Daniel)
  5. A MIME implementation for the Warwick Framework (Jon Knight and Martin Hamilton)
  6. Report on Distributed Indexing Workgroup....Embedding Metadata
  7. W3C Distributed Indexing Workshop Link
  8. Guidelines for the preparation of Dublin Core metadata / Warwick Framework containers (John Kunze and Others)

Acknowledgements

The authors are indebted to many organizations and individuals that paved the way for this work and contributed substantively to the success achieved.