[This local archive copy mirrored from the canonical site: http://194.75.134.50/rdf/discussion.html; links may not have complete integrity, so use the canonical document at this URL if possible.]

Dublin Core Data Model Discussion Paper (v0)

Charles Wicksteed   charles.wicksteed@reuters.com   12 June 1998

[[Paper prepared for the meeting of the DC Data Model working group.

Please excuse the rough edges. You will notice that some of the text is just lifted from the Data Model WG issues list, and other parts are simply missing. My main purposes in writing this document were:

Double square brackets around blocks of text indicate notes etc addressed to the members of the working group. The other text is draft text for publication.]]

Table of Contents

1   Introduction
    1.1   Summary
    1.2   Scope
        1.2.1   Rationale
    1.3   Extensibility
2   Introduction to RDF
    2.1   Properties, Property Types and Property Values
    2.2   Resources as Property Values
    2.3   Nodes and Arcs
    2.4   Annotation Nodes
    2.5   Serialization Syntax
        2.5.1   Nodes, URIs and IDs
    2.6   Rationale
        2.6.1   Miller Syntax
        2.6.2   DC Extensive Form
3   Structure of Qualified Dublin Core
    3.1   Introduction
    3.2   Use of Element Names for Resources and Annotation Nodes
        3.2.1   The Outermost Level
        3.2.2   Rationale
        3.2.3   Inner Levels
        3.2.4   Rationale 2
    3.3   Types and Subelements
        3.3.1   Confusion
        3.3.2   Approach in this document
    3.4   Property Type or Property Value?
    3.5   Extensibility
    3.6   Degrading to Unqualified Dublin Core
4   General Principles
    4.1   Dublin Core in RDF
    4.2   XML Namespace
    4.3   Repeated Core DC Elements
        4.3.1   Which Elements can be Repeated
        4.3.2   Ways of Repeating Elements
        4.3.3   Ordering of Repeated DC Elements
        4.3.4   [[Repeated Properties
    4.4   Schemes
        4.4.1   Rationale
            4.4.1.1   What is the Scheme of the Scheme Name?
            4.4.1.2   Different Sorts of Schemes
            4.4.1.3   Using the scheme as the property type
            4.4.1.4   DC:Scheme not used
    4.5   [[Type values
5   The Core Elements
    5.1   Title
        5.1.1   Rationale
    5.2   Creator
        5.2.1   Role
            5.2.1.1   Structure of Role
            5.2.1.2   Semantics of Role
        5.2.2   vCard
        5.2.3   Globally Unique IDs
        5.2.4   Personal and Corporate Names
        5.2.5   Rationale
            5.2.5.1   Personal and Corporate
            5.2.5.2   Role
    5.3   Subject
    5.4   Description
    5.5   Publisher
    5.6   Contributor
    5.7   Date
        5.7.1   Structure
        5.7.2   Rationale
    5.8   Type
        5.8.1   Rationale
    5.9   Format
    5.10   Identifier
        5.10.1   Identification in RDF
        5.10.2   Use of the Identifier element
    5.11   Source
    5.12   Language
    5.13   Relation
    5.14   Coverage
    5.15   Rights
6   Qualified DC in HTML
    6.1   Introduction
    6.2   Mapping of Annotation Nodes
        6.2.1   Rationale
    6.3   Mapping of Aspect Nodes
        6.3.1   Mapping of Aspect Nodes with DC properties
        6.3.2   Mapping of Aspect Nodes with foreign properties
        6.3.3   Rationale
            6.3.3.1   Grouping
            6.3.3.2   Namespaces
    6.4   Mapping of Schemes
    6.5   Mapping of vCard "Name" Property
    6.6   Personal and Corporate Names
    6.7   Annotations with Subelements
    6.8   Mapping of HREFs
7   Other Issues
    7.1   Mixed language content in DC elements
8   Acknowledgements
9   References
    9.1   Additional references
    9.2   Data Model WG issues list references section
10   [[Temporary "Notes" Section
    10.1   The "1:1" Rule
    10.2   The Meaning of Properties
11   Section Numbering

1   Introduction

1.1   Summary

Unqualified Dublin Core is limited in its sophistication, so this document defines the mechanism by which Dublin Core metadata can be extended by the use of qualifiers.

This document has two purposes. Firstly, it explains how metadata can be expressed in RDF (Resource Description Framework). RDF allows metadata values to be structured, instead of being simple strings as in HTML META tags. One aspect of this structuring is to allow the inclusion of qualifiers, which refine or extend the meaning of the core element. For example it is possible to specify that a Title is an Alternative title.

Secondly, having explained the general principles by which the DC can be qualified, this document goes on to define a number of specific RDF property types which are of general applicability across most of the application domains of the Dublin Core. It is hoped that this will increase the effectiveness of searches of qualified DC across different metadata collections, by encouraging the different metadata authors to use the same property type to express the same information.

Although the structures are discussed here in terms of RDF, they are also applicable to other encodings like generic record syntax in Z39.50.

1.2   Scope

If the Dublin Core element set is not sufficient to meet the needs of a particular application, it can be extended in various ways.

One way is to define new elements which are outside the DC set. This document mentions namespaces, which enable such elements to be defined with no danger of name clashes, but has nothing more to say on this topic.

Another way is to define qualifiers for DC elements. This document both explains how this should be done, and defines a few particular qualifiers.

At another level, it will be necessary to define additional *values* for the qualifiers defined in this document. In some cases complete lists of values are provided, intended to be extended only in exceptional circumstances (eg Relation.Type). In other cases this document provides a few example values, and the designers of schemes outside the Dublin Core are expected to define the actual values to be used in practice (eg Creator.Role).

[[TBD - define the process by which new values are defined. For each enumerated list of strings (eg Main, Alternative) we must say what the process is by which additional values are blessed (registration, new version of this document, free-for-all...). This includes the definition of the names of schemes.]]

This document defines a number of property types in the Dublin Core namespace. It is not expected that any more property types in that namespace will be defined, except by updating this document. That is, this document can be taken as the complete definition of all the official DC qualifiers.

1.2.1   Rationale

The qualifiers defined in this document are those which experience has shown to be of general use across application domains, and therefore worth defining as part of the Dublin Core effort. If the definition of these qualifiers were left to the individual communities of DC users, there would be more chance that different names would be chosen for qualifiers with overlapping meanings, which would make it more difficult to do combined searching of metadata originating in more than one community.

The choice of qualifiers is fairly arbitrary. Several of the 15 core elements have no qualifiers defined at all.

1.3   Extensibility

The qualifiers defined here are extensible in various ways. If an element, a qualifier or a set of values does not meet your needs it is possible to define a new one.

There are a number of factors to weigh up:

2   Introduction to RDF

This section presents a very brief introduction to the aspects of the Resource Description Framework which are relevant to this document. For full details, see the RDF Model and Syntax specification [RDF-M&S].

2.1   Properties, Property Types and Property Values

RDF allows the expression of name/value pairs in the same way as HTML META tags, but it also allows more sophisticated structures.

HTML META tags can be mapped into RDF properties. The terminology is different, but the principles are the same. Consider this example:

   <meta name="DC.Creator" content="Fred Smith">

This maps into an RDF property, whose property type is "DC:Creator" and whose value is "Fred Smith". In RDF, the word "property" is used to refer to a particular instance of property type and value. For example if a paper has two authors, it would have two properties, both with the same type (DC:Creator), but different values (eg "Fred Smith" and "John Doe").

2.2   Resources as Property Values

As described so far, RDF does not offer anything more powerful than HTML META tags. However RDF property values don't have to be strings; the value of a property can also be another resource. Such a resource can be either a real-world resource or an anonymous resource whose raison d'être is simply to have more properties of its own. There will be plenty of examples of this later.

So the value of a DC property could be a resource, which then has properties of its own, whose values can again be strings or more resources. Hence the resources and properties can be thought of as a graph or network, where the nodes are resources and the arcs are the properties.

2.3   Nodes and Arcs

We can draw our metadata out as a directed labelled graph where the nodes are the resources. The properties are drawn as directed arcs -- that is they have a direction from one node to another. The starting node is the one that represents the resource which has the property, and the arc points to a node that represents the value.

In the RDF specifications, when drawing these node-and-arc diagrams, resource nodes are represented by ellipses and string nodes by rectangles. In this document, due to technical limitations, resource nodes are drawn with square brackets and string nodes in quotation marks like this:

  [mydoc] --------DC:Creator-----> "Fred Smith"

This shows an arc from the node "mydoc" pointing to a string "Fred Smith". "DC:Creator" is the label on the arc; the label is the property type. The label is written on the line, with no punctuation.

2.4   Annotation Nodes

Instead of a string, the value of a property can be an intermediate node, which comes between the main resource and the main value of the property. In this document, these are called Annotation Nodes, because they are used to attach annotations to the main value.

Example:

  [Resource] -----DC:Title-------> [#node001]
  [#node001] --+--RDF:Value------> "Paris Symphony"
               +--DC:Title.Type--> "Alternative"

Here, the title of the work is "Paris Symphony", but we have inserted an annotation node, called #node001, between the main resource and the string "Paris Symphony". The purpose of the annotation node is to carry the annotation which indicates that this is the alternative title of the work.

This is a very powerful mechanism, and enables extra information to be added to the metadata in a way that preserves the relationships between the different parts.

Note the use of the property type "RDF:Value" in the example above. This is an indication that the "real" value of the DC:Title property is actually "Paris Symphony". Programs that process RDF will know to follow such arcs, so that a search for a resource with title "Paris Symphony" will find this resource even though it has an annotation. This is discussed in more detail in the section on matching [[not written yet!]].

[[Possible text for Matching section:

We recommend that systems which carry out matching of DC metadata provide a permissive mode in which an element is matched regardless of whether or not it has annotations. In this permissive mode, when matching DC elements with and without annotations, the matching process should use the primary value of the annotated element. It should locate this value by following RDF:Value arcs. Sophisticated systems may additionally offer stricter matching algorithms.]]

Here are some more complicated examples:

   [Res] --DC:Elem--> "Primary value"

   [Res] --DC:Elem--> [ANode] --RDF:Value---> "Primary value"
                         |
                          --DC:Elem.Foo--> "Annotation Foo"

   [Res] --DC:Elem--> [ANode] --RDF:Value---> [ANode] --RDF:Value---> "Primary value"
                         |                       |
                         |                        --XX:Bar--> "Annotation Bar"
                         |
                          --DC:Elem.Foo--> "Annotation Foo"

(where "Res" stand for "Resource", "Elem" stands for "Element", "ANode" stands for "Annotation Node").

Note

There will be a spectrum of matching criteria offered by various RDF systems, ranging from the strictest [a precise match of nodes, arcs and string values] to the most permissive. From the DC perspective, the strictest matching is probably unhelpful, as it would fail to match many cases which most users would consider to be equivalent.

2.5   Serialization Syntax

The RDF Model and Syntax specification defines a serialization syntax, which is a way of expressing RDF data in a file or on a network as a sequence of bytes. The syntax uses Extensible Markup Language [XML].

2.5.1   Nodes, URIs and IDs

In many cases in this document, the value of an element is stated to be a node. In this case, the node may be identified by:

2.6   Rationale

An alternative to using the node-and-arc diagrams described above is to invent some sort of syntax-neutral notation. That is, a notation independent of both RDF and HTML META tags.

It has been decided not to use such a notation in this document because of the complication involved. For completeness however, here are brief descriptions of two possible notations.

2.6.1   Miller Syntax

One possible notation is that proposed by Eric Miller. [[I have made some assumptions about the details.]]

    (Creator = Fred Smith)

    (Title
        (Value = Paris Symphony)
        (Type = Alternative)
    )

    (Creator
        (Value
            (FN = Mr. John Q. Public, Esq.)
            (EMAIL = jqpublic@xyz.dom1.com)
        )
        (Role = Illustrator)
    )

2.6.2   DC Extensive Form

Another possible notation is called DC Extensive Form, from Sigfrid Lundberg.

In simple DC, the metadata is expressed as simple name-value pairs, as used in HTML META tags. In Extensive Form, each element still has a name, but the value may be structured instead of being a simple string. The value of the element can consist of either a string or a nested set of name-value pairs. The values can be nested to any depth.

For documentation purposes, the names are written with colons after them, and indentation is used to show the nesting.

For documentation purposes we introduce a new qualifier called "content" which holds the main value of the metadata. The examples will make it clear.

Examples

    Creator:    "Fred Smith"

    Title:      Content:        "Paris Symphony"
                Title.Type:     "Alternative"

    Creator:    Content:        FN:         "Mr. John Q. Public, Esq."
                                EMAIL:      "jqpublic@xyz.dom1.com"
                Creator.Role:   "Illustrator"

3   Structure of Qualified Dublin Core

3.1   Introduction

This section contains information about the way the property types and values are arranged and named in qualified DC.

3.2   Use of Element Names for Resources and Annotation Nodes

We introduced above the concept of a real-world resource, that is, a resource about which we want to define some metadata. Such a resource would have a Title, Creator etc. We also call these "actual resources". The RDF specifications use the word "resource" to mean any RDF node, including what we call annotation nodes. In RDF, these resources can have properties as easily as real-world resources.

In this document however, we make a distinction between real-world resources and annotation nodes. In particular, real-world resources may take any of the fifteen core elements, but annotation nodes may not.

3.2.1   The Outermost Level

In RDF, we use only DC element names as the names of properties of nodes which represent actual resources. In other words, at the outermost level, we do not use any property names other than the fifteen DC element names. For example:

this is OK:

   [Resource] --DC:Type-->

this is not OK:

   [Resource] --DC:Splunge-->

this, of course, is OK:

   [Resource] --XY:Splunge-->

3.2.2   Rationale

An alternative could be to allow other property types at the outermost level. For example, it has been proposed that the distinction between the Creator, Contributor and Publisher elements is unhelpful in qualified DC, because, in qualified DC, we are not restricted to just those roles, but can specify any other role. A cleaner implementation may be to define an Agent element to be used at the outermost level, which would cover Creator, Contributor, Publisher and more, by way of an Agent.Role qualifier. For example, these would be equivalent:

  [Resource] -----DC:Creator-----> "John Smith"

  DEPRECATED EXAMPLE
  [Resource] -----DC:Agent-------> [#node001]
  [#node001] --+--RDF:Value------> "John Smith"
               +--DC:Agent.Role--> "Creator"

You could also have:

  DEPRECATED EXAMPLE
  [Resource] -----DC:Agent-------> [#node001]
  [#node001] --+--RDF:Value------> "Harold Rosson"
               +--DC:Agent.Role--> "majorContributor.cinematographer"

We have decided against this approach because the 15 elements of the core must be a single element set for all Dublin Core metadata, both qualified and unqualified. The use of Agent, as above, would mean that we are using a different set of elements for qualified DC, ie a set including Agent, but excluding Creator, Contributor and Publisher. Using a different element set for qualified DC would make it difficult for users to make searches that return hits from both qualified and unqualified DC: a search for "Agent" would only return resources described using qualified DC, and a search for "Creator" would only return resources described using unqualified DC. One could work round this by changing the search to "(Agent with Role=Creator) OR Creator", or by making the search engine treat these as equivalent by special programming.

3.2.3   Inner Levels

In RDF, we do not use the fifteen DC element names except as the names of properties that apply to actual resources. In other words, at inner levels, we do not use the fifteen DC element names to name properties. For example:

this is OK:

   [Annotation node] --DC:Title.Type-->

this is not OK:

   [Annotation node] --DC:Type-->

this, of course, is OK:

   [Annotation node] --XY:Type-->

3.2.4   Rationale 2

We use "DC:Title.Type" and not "DC:Type" because "DC:Type" is already the name of a core element. We want to define the usage of each property type independently -- the usage of "DC:Title.Type" is quite different from the usage of "DC:Type", in terms of where it can appear and the legal values it can take. We may want to define these features formally in a machine-readable way, so distinct names are essential.

We could have put all the qualifiers in a separate namespace from the 15 core elements, say DCE, so as to distinguish DC:Type from DCE:Type, but that would not have distinguished between "DCE:Type" for Title and "DCE:Type" for Date anyway.

We would name sub-subelements as shown below. As yet, no such sub-subelements have been defined in DC. vCard already has provision for the structuring of addresses.

      DC:Creator 
         DC:Creator.Address 
            DC:Creator.Address.Suburb
            DC:Creator.Address.Country

3.3   Types and Subelements

3.3.1   Confusion

There have been attempts in the past to give the same structure to, for example, Creator.Address and Date.Issued. This leads to difficulties, and so we have decided to give them different structures as explained below.

We divide the qualifiers into two groups. Each qualifier is one of these:

A large portion of the DC community has implemented systems that do not draw this distinction, using HTML's Meta tag. Examples of usage are:

   <meta name="DC.Relation.IsBasedOn" content="http://a.b.c/foo.bar">

   <meta name="DC.Date.Issued" content="2000-01-01">

A strategy which would prevent interoperability between these forms of DC metadata and the more modern forms based on the work of this WG would be unacceptable.

3.3.2   Approach in this document

Consider one of the examples above:

   <meta name="DC.Date.Issued" content="2000-01-01">

In the past, "Issued" was described as a sub-element of "Date". As explained under "Date" below, we have decided that "Issued" should instead be the value of DC:Date.Type:

  [Resource] -----DC:Date--------> [#node001]
  [#node001] --+--RDF:Value------> "1998-03-31"
               +--DC:Date.Type---> "Issued"

but that this should be encoded in HTML thus:

   <meta name="DC:Date.Issued" content="1998-03-31">

The mapping to HTML for the first group above, ie aspects of elements, is described in a later section.

3.4   Property Type or Property Value?

In defining the property types in this document, a number of arbitrary decisions were made about whether a particular item should be the name of a property type or the value of a property.

Here is one example. Other examples will be found later in the document.

We wish to include in our metadata various dates, for example the date that a resource was created and the date when it was issued. There are two possible ways of encoding this:

  Method 1

  DEPRECATED EXAMPLE
  [Resource] -----DC:Date---------> [#node001]
  [#node001] --+--DC:Date.Created-> "1998-03-20"
               +--DC:Date.Issued--> "1998-03-31"

  Method 2

  [Resource] -----DC:Date---------> [#node001]
  [#node001] --+--RDF:Value-------> "1998-03-20"
               +--DC:Date.Type----> "Created"

  [Resource] -----DC:Date---------> [#node002]
  [#node002] --+--RDF:Value-------> "1998-03-31"
               +--DC:Date.Type----> "Issued"

In the first method, "Created" and "Issued" are defined as property types. They are written as "DC:Date.Created" and "DC:Date.Issued" in case we want to use "Created" and "Issued" in connection with other elements, with different usage.

The problem with the first approach is that the structure of the nodes does not match our intended meaning well. We could define the Date element to be like that, and it would work up to a point, but difficulties would creep in when more subtle processing was needed.

The problem is that the node #node001 does not represent anything very helpful. It represents the dates associated with this resource, and it has two properties whose values are two of the dates.

In contrast, the second formulation, although more bulky, is more logical. Each of the nodes #node001 and #node002 represents one of the dates, and the properties belong to those dates. One property (RDF:Value) is the main value of the date, ie the moment in time that it represents. The other property (DC:Date.Type) is an annotation to specify precisely the relationship of this date to the resource in question.

3.5   Extensibility

In some cases, as in the types of Dates mentioned above, there is a choice as to whether a value (eg "Issued") goes in the name of the property type or as the value of a property type.

If the list of possible values is at all open-ended, it is better to make them values and not property types, so that we can extend the list without defining a new property type.

If a scheme is being used, new values can be defined by defining a new scheme or modifying the definition of an existing scheme.

3.6   Degrading to Unqualified Dublin Core

[[Some people think that degrading will never actually be done. Topic for discussion.]]

It is important to consider the mechanisms by which DC metadata expressed in RDF can be converted to unqualified DC metadata, for passing to a system which can only handle unqualified DC.

By "unqualified DC", we mean metadata which uses only the 15 core elements as property types, and has strings or URIs for all the values.

It is not within the remit of this document at this time to define such a mechanism, but a brief description may help to explain some of the decisions about element structure that have been taken.

The general procedure is to concatenate the string values of all the nodes which make up the value of the element. The result is used as the value of the unqualified DC element. A space is inserted between each pair of strings. The names of the property types are discarded. The values of RDF:Value properties are processed last, but otherwise multiple properties on a single node are processed in an arbitrary order (there are not many examples of this).

If a property has another node as its value, and no properties of that node are known (ie it was defined via an RDF:href), then the URI of the node is used as the string value. There are actually some problems with knowing where to stop following the web of references, but this conveys the essential principles.

Example 1:

  [Resource] -----DC:Subject---------> [#node001]
  [#node001] --+--RDF:Value----------> "Cookies"
               +--DC:Subject.Scheme--> "LCSH"

This becomes, when degraded to unqualified:

  [Resource] -----DC:Subject---------> "LCSH Cookies"

Example 2:

  [Resource] -----DC:Creator-----> [urn:guid:123456789]

This becomes, when degraded to unqualified:

  [Resource] -----DC:Creator-----> "urn:guid:123456789"

[[Needs more explanation? Is it important?]]

4   General Principles

4.1   Dublin Core in RDF

In this section, we look at some general principles that apply to DC metadata in RDF. Later we consider each element in turn, for the specific details that apply to particular elements.

4.2   XML Namespace

The document "Namespaces in XML" [XML-NS] defines a mechanism for the use of namespaces in XML. All the elements and property types defined by this document are in the Dublin Core namespace, whose namespace name is "http://purl.oclc.org/metadata/dublin_core_elements". The namespace prefix "DC" is used in this document to denote that namespace.

[[Note that Renato suggested "http://metadata.net/DC/1.0/" in November 97. I don't know how serious he was.]]

Users of Dublin Core should not use the DC namespace for property types that are not defined in this document. Such extensions should use a namespace which is associated with the person or organization defining the extension, even if they are for use with a Dublin Core element. For example, a new qualifier may be defined to indicate the importance of each creator thus:

  [Resource] -----DC:Creator-----> [#node001]
  [#node001] --+--RDF:Value------> [#node002]
               +--XX:Creator.Importance-> "Minor"
  [#node002] --+--VC:FN----------> "Mr. John Q. Public, Esq."
               +--VC:N-----------> "Public;John;Quinlan;Mr.;Esq."
               +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"

The new qualifier "Creator.Importance" is not in the DC namespace, even though it qualifies a DC element, and begins with the string "Creator". It is up to the owner of the XX namespace to ensure that the string used (eg "Creator.Importance") is not reused within that namespace to mean something different.

4.3   Repeated Core DC Elements

4.3.1   Which Elements can be Repeated

Almost all the DC elements may be meaningfully repeated, for example a resource may have more than one title or author, and hence it may have more than one DC:Title or DC:Creator property. At first thought, one might not expect DC:Description to be repeated, but some resources have both a long and a short description which could both be usefully included.

Some elements are less likely to be repeated, but it is still legal for all of them.

4.3.2   Ways of Repeating Elements

There are four ways of specifying repeated properties in RDF:

  1. simply repeating the property,
  2. using a Seq,
  3. using an Alt,
  4. using a Bag.

We allow the use of all of the above mechanisms for repeating DC elements.

We recommend that systems which carry out matching of DC metadata provide a permissive mode in which an element is matched regardless of whether it occurs singly, or in one of the four forms described above. Sophisticated systems may additionally offer stricter matching (eg, "Find me all resources which have Friedrich Engels as Creator number 2").

We recommend the use of option 1 (simply repeating the element) rather than option 4 (using a Bag), unless the metadata creator has a good reason to use a Bag.

Note

There will be a spectrum of matching criteria offered by various RDF systems, ranging from the strictest [a precise match of nodes, arcs and string values] to the most permissive. From the DC perspective, the strictest matching is probably unhelpful, as it would fail to match many cases which most users would consider to be equivalent.

4.3.3   Ordering of Repeated DC Elements

DC RFC #1 has been updated to support ordering.

This section deals with the ordering of repeated elements (eg two Creators), not with the ordering of different elements (eg Creator and Title).

There are situations where the creators of metadata wish to indicate a specific ordering of repeated elements. We shall allow multiple instances of an element to be placed in a specified sequence in environments which support such functionality.

4.3.4   [[Repeated Properties

There is still an undecided issue with respect to repeated properties.

Some of our Japanese colleagues have explained that Japanese cataloging allows two entries for various fields: (i) an entry written using Kanji (ideographic) characters and (ii) an entry written using phonetic characters. The reason for this is that there is generally not a one-to-one correspondence between spelling (using Kanji characters) and pronunciation. The second entry permits search by pronunciation, in cases where the Kanji characters used in the first entry are not known to the user. This interesting problem brought to light other requirements, such as:

This issue relates to the repetition of properties which do not directly correspond to DC elements. Note that there is a logical difference between having two Creators, called Robert and Bob (respectively), and having one Creator, called both Robert and Bob. The questions to be answered include:

  1. Which properties can be repeated and how?
  2. Can an RDF:Value be repeated?

]]

4.4   Schemes

In many cases, the value of an element is stated to be a scheme-qualified string. This means a node whose properties are:

Example:

  [Resource] -----DC:Subject---------> [#node001]
  [#node001] --+--RDF:Value----------> "Cookies"
               +--DC:Subject.Scheme--> "LCSH"

This indicates that the subject is "Cookies" as defined by the Library of Congress Subject Headings scheme.

Schemes are intended to provide information which will be helpful in the interpretation of the value of the element. There is a range of applications:

Some schemes are near the border between these applications, for example the Dewey Decimal system, which defines both a structure and a set of values.

The procedure for the registration of schemes for any particular property type (eg DC:Subject.Scheme) is defined by the person or organization who defines that property type. For the schemes defined in this document, the procedure is defined here. [[Not yet written! I suggest that the list is published on the DC web site, with a mail address to submit requests to, and a public mailing list for discussion, like the registration procedure for MIME types, see RFC 2048.]]

[[Question: Can we have the ability to use "private" schemes? Would they begin "x-"?]]

[[Idea: we could use XML entities to represent URIs of scheme names (see also under "Type values").]]

[[Another note: a scheme may be used in HTML to indicate whether multiple concatenated values are separated by commas, semicolons or something else. This doesn't really apply to RDF.]]

Note: schemes are very important to resource discovery although a typical user will not be searching for the actual scheme name. That is, the user is not usually interested in finding something whose metadata uses a particular scheme. The scheme is important for a search engine, for example, to know the format of a date like "11/3/97" in order to know whether to match it with a search query containing "3 Nov 97" or one containing "11 Mar 97".

4.4.1   Rationale

4.4.1.1   What is the Scheme of the Scheme Name?

This way of handling schemes is not entirely logical, as the concept of a scheme is something that has a much wider applicability to data in RDF than just the Dublin Core. One point of view is that we are implying that "LCSH" is somehow an official DC scheme, whereas in fact it is not particularly sanctioned by the DC community. If DC:XXX.Scheme does represent the general concept of indicating the encoding, in the most general sense, of the value, then really we need a scheme qualifier on the DC:XXX.Scheme property itself, to indicate by what criterion we decide that "LCSH" represents the Library of Congress Subject Headings.

[[The above paragraph needs clarification! CW]]

In practice, however, there are few enough schemes such that clashes are unlikely, and the advantages of a simple method like this outweigh the complications of doing it in a watertight manner (for example using domain names to ensure uniqueness).

4.4.1.2   Different Sorts of Schemes

It has been suggested that the different applications of schemes listed above should be mapped onto distinct property types, for example DC:Scheme and DC:Structure, so as to avoid confusion. It also sets a precedent for introducing a new name for a new concept: there may be a requirement to specify the units of a value (eg pounds, kilograms), and it would not be a good idea to use DC:Scheme for this. Other people, however, consider that this is unnecessary, as there unlikely to be two schemes of different types with the same name, and a different property type for units can be defined later anyway.

4.4.1.3   Using the scheme as the property type

A completely different way of expressing scheme information is like this:

  [Resource] -----DC:Subject-----> [#node001]
  [#node001] -----LOC:LCSH-------> "Cookies"

Here the scheme name is used as the property type for the subject value. An advantage of this is that the worry about the scheme used for the scheme name goes away: the namespace mechanism deals with the problem of name clashes.

Disadvantages are:

4.4.1.4   DC:Scheme not used

We use "DC:Subject.Scheme" and not "DC:Scheme", so that the legal values for the different schemes can be enumerated separately.

4.5   [[Type values

There is still an undecided issue with respect to DC:XXXXX.Type values.

How should we define Type values such as "Main" and "Alternative"? The possibilities include:

  1. As string values, eg "Main".
  2. As node references, eg "http://purl.org/metadata/dublin_core/schema#Title.Type.Main".

For examples of the second approach, see [RDF-Schemas].

A named entity can be used to avoid typing the full URI, eg:

   <?xml version="1.0"?>
   <?xml:namespace ns="http://purl.org/metadata/dublin_core" prefix="DC"?>
   <!DOCTYPE RDF:RDF [<!ENTITY DC "http://purl.org/metadata/dublin_core/schema" >]>

   <RDF:RDF>

   <RDF:Description RDF:HREF="http://a.com/mydoc">
      <DC:Title>
         <RDF:Description>
            <RDF:Value>Paris Symphony</RDF:Value>
            <DC:Title.Type RDF:HREF="&DC;#Title.Type.Alternative"/>
         </RDF:Description>
      </DC:Title>
   </RDF:Description>

   </RDF:RDF>

Note that the above example uses the abbreviation "DC" for two quite distinct purposes:

  1. As a namespace prefix, eg in "<DC:Title>"
  2. As an entity name, eg in "&DC;#Title.Type.Alternative"

One advantage of using the node reference is that there is definitely no need to use a Scheme with it. The value of DC:Title.Type could be any other node that someone wishes to declare to be a DC title type. There is no danger that two people will accidentally use the same string to mean two different things.

]]

5   The Core Elements

5.1   Title

The value of a DC:Title property must be one of the following:

A DC:Title property whose value is a string is equivalent (in a very specific sense to be explained another time) to one with a DC:Title.Type of "Main".

"Alternative" is intended to be used for sub-titles, translated titles, series title etc. [[BUT, perhaps not translated titles. See mail on dc-subelements list 28 Jan 98.]] It can also be used if the main title contains an acronym: the alternative title would contain the expansion of the acronym.

Here is an example of the use of Title.Type to distinguish between two Titles. The unqualified title is the "Main" title by default.

  [Resource] --+--DC:Title-------> "Symphony No. 31 in D Major"
               +--DC:Title-------> [#node001]
  [#node001] --+--RDF:Value------> "Paris Symphony"
               +--DC:Title.Type--> "Alternative"

The corresponding RDF would look like this:

   <RDF:Description RDF:HREF="http://mozart.org/KV297">
      <DC:Title>Symphony No. 31 in D Major</DC:Title>
      <DC:Title>
         <RDF:Description>
            <RDF:Value>Paris Symphony</RDF:Value>
            <DC:Title.Type>Alternative</DC:Title.Type>
         </RDF:Description> 
      </DC:Title>
   </RDF:Description>

5.1.1   Rationale

We have not specified a Scheme on DC:Title.Type. [[We have several options:

The advantages and disadvantages are...TBD.]]

5.2   Creator

5.2.1   Role

5.2.1.1   Structure of Role

The Creator, Contributor and Publisher elements have some aspects in common, for example they can all be accompanied by a Role. The common aspects are described here, under Creator.

The value of a DC:Creator property must be one of the following:

Nodes of class DC:CCreator may have the following properties:

[[maybe do without the named class -- unnecessary complication?]]

There is no default value for Role. That is, this document does not define the meaning of a role-less Creator to be equivalent to a Creator with any particular value of Role.

Example:

  [Resource] -----DC:Creator-----> [#node001]
  [#node001] --+--RDF:Value------> [#node002]
               +--DC:Creator.Role> "Illustrator"
  [#node002] --+--VC:FN----------> "Mr. John Q. Public, Esq."
               +--VC:N-----------> "Public;John;Quinlan;Mr.;Esq."
               +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"
  [Resource] -----DC:Creator-----> [#node001]
  [#node001] --+--RDF:Value------> [#node002]
               +--DC:Creator.Role> [#node003]
  [#node002] --+--VC:FN----------> "Mr. John Q. Public, Esq."
               +--VC:N-----------> "Public;John;Quinlan;Mr.;Esq."
               +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"
  [#node003] --+--RDF:Value------> "illustrators"
               +--DC:Creator.Role.Scheme--> "AAT"

[[Note: The AAT doesn't seem to be a very good source of roles. Any better suggestions?]]

5.2.1.2   Semantics of Role

The Role property is intended to indicate the role of the creator in a way that is more precise than the definition of the plain Creator element. To clarify the meaning further, here is a list of possible values for the role, taken from [KUNZE-1996]. Some of these are more appropriate as the roles of a Contributor or Publisher.

    composer      editor        librettist    photographer   translator
    distributor   illustrator   mirror        publisher

There are many roles for people and institutions in the creation, dissemination and provision of information resources. It may be hard to determine whether a particular person, institution or organization is a Creator, Contributor or Publisher. The usual practice in these circumstances is to use the Creator element for all the roles. Hence users searching for particular contributors or publishers are advised to search under Creator as well as Contributor or Publisher.

5.2.2   vCard

The node that represents the person, institution or organization should have properties defined by some broad-based standard, such as vCard [vCard-IETF]. A few vCard properties are shown below.

PropertyDescriptionExample
FN Formatted name Mr. John Q. Public, Esq.
N Name Public;John;Quinlan;Mr.;Esq.
ADR Delivery address ;;123 Cliff Ave.;Big Town;CA;97531;US
EMAIL Email address jqpublic@xyz.dom1.com
TEL Telephone number +1-213-555-1234
ORG Organization Name and Organizational Unit ABC, Inc.;North American Division;Marketing

Here is an example of a creator defined using a vCard:

  [Resource] -----DC:Creator-----> [#node001]
  [#node001] --+--VC:FN----------> "Mr. John Q. Public, Esq."
               +--VC:N-----------> "Public;John;Quinlan;Mr.;Esq."
               +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"

It has been suggested that, to assist systems that don't understand the vCard elements, this should be represented thus:

  [Resource] -----DC:Creator-----> [#node001]
  [#node001] --+--RDF:Value------> "Mr. John Q. Public, Esq."
               +--VC:FN----------> "Mr. John Q. Public, Esq."
               +--VC:N-----------> "Public;John;Quinlan;Mr.;Esq."
               +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"

This has the advantage that systems which don't understand the vCard elements can still do approximate matching, so that a query asking for "Creator = Mr. John Q. Public, Esq." would return this resource. It could be argued that the approximate matching should be set up so that, in the absence of an RDF:Value property, all the string-valued properties are considered. Then a search for "Creator = Mr. John Q. Public, Esq." would still return this resource even without the RDF:Value property. In these circumstances, a search for "Creator contains xyz" will also match, which may or may not be what we want.

5.2.3   Globally Unique IDs

Some projects are considering the use of OIDs (Object IDs) or GUIDs (Globally Unique IDs) on all objects. People, institutions and organizations would thus have a GUID. No details of the person would appear in the metadata other than the value of the GUID, expressed as a URI (Uniform Resource Identifier).

  [Resource] -----DC:Creator-----> [urn:guid:123456789]

expressed in RDF serialization syntax thus:

   <RDF:Description RDF:HREF="http://a.com/mydoc">
      <DC:Creator RDF:HREF="urn:guid:123456789"/>
   </RDF:Description>

[[sorry I don't know the syntax of GUIDs]]

5.2.4   Personal and Corporate Names

Some existing metadata standards, such as MARC [ref], distinguish between Personal and Corporate names. If such a distinction is needed when generating DC metadata, an additional vCard field X-DC-TYPE should be used, with the value "Personal" or "Corporate".

This shows how to map a MARC Personal name:

  [Resource] -----DC:Creator-----> [#node001]
  [#node001] --+--VC:FN----------> "Mr. John Q. Public, Esq."
               +--VC:X-DC-TYPE---> "Personal"

5.2.5   Rationale

5.2.5.1   Personal and Corporate

Alternatives to "Personal" and "Corporate" have been suggested. The usage above does not fit in well with the fact that "Personal" and "Corporate" are adjectives. Changing them to nouns results in "Person" and "Corporation". However "Corporation" has quite different connotations, so "Organization" was suggested.

One argument in favour of "Personal" and "Corporate" is their existing wide usage in library circles.

5.2.5.2   Role

The elements Creator, Contributor and Publisher have deep underlying similarities, so the handing of Role has been made uniform across these three elements.

In a number of important environments, greater precision is needed in specifying a role than the distinction provided by these three elements, hence the need for a Role property.

The Role has been defined as an annotation because it is not an property of the person or organization itself, but is rather a property of the relationship (or RDF node) tying together the resource and the person or organization.

There may be some pressure to turn "Role" into "Type" so as to match all the other annotations. Then we would have a uniform pattern like this:

  [Resource] -----DC:XXXXX-------> [#node001]
  [#node001] --+--RDF:Value------> "foo bar"
               +--DC:XXXXX.Type--> "qaz"

Note that the string after the dot in the property type on the intermediate node is spelt "Type" in all cases, and not sometimes spelt "Role".

The advantage of this is that it gives more uniformity. The disadvantage is that it warps the meaning of the Role annotation, and will discourage people from inventing other annotations with natural descriptive names. Or, put another way, the designers of other annotations will be discouraged from giving them natural names, and will feel constrained, quite unnecessarily, to call them "Type".

5.3   Subject

This document defines just one property type for use on values of the Subject element, and that is DC:Subject.Scheme.

Example:

  [Resource] -----DC:Subject---------> [#node001]
  [#node001] --+--RDF:Value----------> "Cookies"
               +--DC:Subject.Scheme--> "LCSH"

The value of the Subject element may be a string or a node. Using a node provides a very clean way of using a controlled vocabulary. Like this:

  [Resource] -----DC:Subject-----> [http://loc.gov/LCSH#Cookies]

There is flexibility as to how much of the text relating to the subject needs to appear in the RDF serialization. Consider, for example, this node-and-arc diagram:

  [Resource] -----DC:Subject-----> [http://ddc.???/DDC#025.484]

This could be serialized thus:

    <DC:Subject>
      <RDF:Description RDF:href="http://ddc.???/DDC#025.484">
        <DDC:Class>025.484</DDC:Class>
        <DDC:Heading>Machine Readable Catalog Record Formats</DDC:Heading>
      </RDF:Description>
    </DC:Subject>

or thus:

    <DC:Subject RDF:href="http://ddc.???/DDC#025.484"/>

The first option means that the textual name of the classification is present in the metadata record, and therefore can be made available for searching by an index that only stores the data found in the RDF files directly associated with the resources indexed. The second option is shorter and cleaner.

However, in the former case, the index could still be made to find resources based on strings like "Machine Readable Catalog", by feeding to it separate files containing, for example:

    <RDF:Description RDF:href="http://ddc.???/DDC#025.484">
      <DDC:Class>025.484</DDC:Class>
      <DDC:Heading>Machine Readable Catalog Record Formats</DDC:Heading>
    </RDF:Description>

Only one entry like this is needed for each classification, however many resources come under this classification.

There are many different views on how RDF is going to be passed around between systems, and the best practice in this area depends on those views.

5.4   Description

No DC qualifiers.

5.5   Publisher

The Publisher element has a structure identical to DC:Creator except for one difference.

The difference is that the name of the property type on the (optional) annotation is DC:Publisher.Role, and not DC:Creator.Role. The meaning and possible values are the same as for DC:Creator.Role.

5.6   Contributor

The Contributor element has a structure identical to DC:Creator and DC:Publisher except for one difference.

The difference is that the name of the property type on the (optional) annotation is DC:Contributor.Role, and not DC:Creator.Role. The meaning and possible values are the same as for DC:Creator.Role.

5.7   Date

5.7.1   Structure

The value of a DC:Date property must be one of the following:

  1. a string holding an actual date,
  2. an annotation node, linking to a string holding an actual date and to another string holding the DC:Date.Type.

The legal values of DC:Date.Type are currently "Created", "Issued", "Accepted", "Available", "Acquired", "DataGathered" and "Valid". See the Date WG report for their definitions. The default value is "Created".

[[Note: I have changed the order to match the *text* of the Date WG report and not the *table*. The columns of the table are monotonic, but the order in the text is not the order obtained by reading across the rows.]]

Note: Both the date itself and the Date.Type can be qualified by schemes. The date itself can be qualified by Date.Scheme, whose values are... [[TBD]] The Date.Type can be qualified by a scheme to indicate which scheme is being used for the strings "Issued" etc. [[Does this document have to define the name of the scheme that defines the strings "Created", "Issued", "Accepted" etc?]]

The two cases are illustrated by the following node-and-arc diagrams:

   [Resource] --DC:Date--> "1998-03-31"

   [Resource] --DC:Date--> [ANode] --RDF:Value--> "1998-03-31"
                              |
                               --DC:Date.Type---> "Issued"

(where "ANode" stands for "Annotation Node").

The two cases are illustrated by the following RDF Descriptions:

   <RDF:Description RDF:HREF="http://www.bananas.org/prices.html">
      <DC:Date>1998-03-31</DC:Date>
   </RDF:Description>

   <RDF:Description RDF:HREF="http://www.bananas.org/prices.html">
      <DC:Date>
         <RDF:Description>
            <RDF:Value>1998-03-31</RDF:Value>
            <DC:Date.Type>Issued</DC:Date.Type>
         </RDF:Description>
      </DC:Date>
   </RDF:Description>

5.7.2   Rationale

An alternative to the above structure is to define "Created", "Issued", "Accepted", "Available", "Acquired", "DataGathered" and "Valid" as sub-elements of Date. That is, the node-and-arc diagram would look like this:

  [Resource] -----DC:Date--------> [#node001]
  [#node001] -----DC:Date.Issued-> "1998-03-31"

or like this:

  [Resource] -----DC:Date.Issued-> "1998-03-31"

There are problems with both of these.

The first one has a seemingly pointless intermediate node.

The second has effectively defined a new top level property type, applied directly to a resource. This has the disadvantage of complicating the approximate matching process. We want this metadata to match a query asking for a DC:Date of "1998-03-31", but to do this the matching program would have to either examine the letters inside the property type name "DC:Date.Issued", or be pre-programmed with a table that indicates that "DC:Date.Issued" is a specialisation of DC:Date.

The reason that we really want to define "Created" etc as sub-elements, is to enable us to write in HTML:

   <meta name="DC:Date.Issued" content="1998-03-31">

This ability to do this is essential because:

The solution described in this document is to define the encoding in HTML to be just that, as a special case, but to continue to define "Created" etc as the value of "DC:Date.Type" in all other cases.

This pushes the matching problem onto the systems that convert to and from the HTML encoding.

5.8   Type

This document defines just one property type for use on values of the Type element, and that is DC:Type.Scheme.

Example:

  [Resource] -----DC:Type------------> [#node001]
  [#node001] --+--RDF:Value----------> "Sound.Music"
               +--DC:Type.Scheme-----> "Tennant"

5.8.1   Rationale

We could use a node reference for the scheme instead of a string, which would eliminate the chance that two people would independently use the same string to refer to two different schemes. In this example, the string "Tennant" has been replaced with a reference to the web page where Roy Tennant's list is maintained. In practice it would be better to use a more stable URI, which be less likely to be reused if someone else takes over the maintenance of the list.

  [Resource] -----DC:Type------------> [#node001]
  [#node001] --+--RDF:Value----------> "Sound.Music"
               +--DC:Type.Scheme-----> [http://sunsite.berkeley.edu/Metadata/types.html]

We could also use a node reference for the value of the type itself, eliminating the need for a scheme. In practice we would use a better URI than the one in this example.

  [Resource] --DC:Type--> [http://sunsite.berkeley.edu/Metadata/types.html#Sound.Music]

5.9   Format

[[Scheme?]]

5.10   Identifier

5.10.1   Identification in RDF

In many cases, DC metadata in RDF does not need to use the Identifier element, because RDF has a built-in mechanism for identifying the resource to which any piece of metadata applies. The RDF:HREF attribute of the RDF:Description is set to the URI of the resource.

If the metadata accompanies the resource, there is no need to set the RDF:HREF attribute of the RDF:Description at all. This may be appropriate in cases when the URI is not known at the time that the metadata is prepared (for example a web page whose final location has not been decided), or where the resource never has a URI (for example search results).

Note that non-Web resources (eg in a database, or a physical resource) can be assigned URIs for the purpose of defining metadata. The mechanisms for assigning URIs to physical resources include the generation of URIs from ISBNs or UUIDs.

5.10.2   Use of the Identifier element

Whether or not the URI appears on the RDF:Description, the Identifier element can be used to contain another identifier (eg a catalogue number) or an alternative URI.

5.11   Source

[[Someone said that the Romeo-and-Juliet example was an unfortunate choice. I don't know if that person suggested an alternative example.]]

The value of a DC:Source property must be one of the following:

The use of a URI is strongly recommended. This enables a connection to be made between the current resource and the source resource, which may be useful in resource discovery. For example, one can search for all resources which quote works by a particular author as a source. This could be done by keying on the title of the work, but this is less reliable, as it depends on the titles being character-for-character identical, or else using some approximate matching technique.

Note that the node representing the source resource may itself have properties. The types of these properties may be any (of the fifteen) DC property types or non-DC property types.

The three cases are illustrated by the following node-and-arc diagrams:

   [Resource] --DC:Source--> [http://www.shakespeare.com/romeo-and-juliet]

   [Resource] --DC:Source--> [#romeo-and-juliet]

   [Resource] --DC:Source--> "Conversations with my father"

and by the following RDF Descriptions:

   <RDF:Description RDF:HREF="http://www.films.com/west.side.story">
      <DC:Source RDF:HREF="http://www.shakespeare.com/romeo-and-juliet"/>
   </RDF:Description>

   <RDF:Description RDF:HREF="http://www.films.com/west.side.story">
      <DC:Source RDF:HREF="#romeo-and-juliet"/>
   </RDF:Description>

   <RDF:Description RDF:HREF="http://www.novel.org/childhood">
      <DC:Source>Conversations with my father</DC:Source>
   </RDF:Description>

[[To do: expression of Source in HTML]]

[[Note: the name "Source.Type" as used in HTML has a quite different derivation and use from "Title.Type", "Date.Type" etc... Do we want to mention this?]]

5.12   Language

[[Scheme]]

5.13   Relation

The value of a DC:Relation property must be a node with two properties:

The legal values of DC:Relation.Type are currently "IsPartOf", "HasPart", "IsVersionOf", "HasVersion", "IsFormatOf", "HasFormat", "References", "IsReferencedBy", "IsBasedOn", "IsBasisFor", "Requires" and "IsRequiredBy". See the Relations WG report for their definitions. There is no default value.

[[Note: for RFC 3, we need to include the full definitions.]]

This is illustrated by the following node-and-arc diagram:

   [Resource] --DC:Relation--> [ANode] --RDF:Value------> [Related resource]
                                  |
                                   --DC:Relation.Type---> "IsBasedOn"

(where "ANode" stands for "Annotation Node").

and by the following RDF Description:

   <RDF:Description RDF:HREF="http://ds.internic.net/internet-drafts/draft-kunze-dc-02.txt">
      <RDF:Value RDF:HREF="http://purl.oclc.org/docs/metadata/dublin_core"/>
      <DC:Relation.Type>IsBasedOn</DC:Relation.Type>
   </RDF:Description>

5.14   Coverage

[[TBD]]

5.15   Rights

[[TBD - defer?]]

6   Qualified DC in HTML

6.1   Introduction

This section describes how to express qualified DC in HTML META tags. This is done by showing how certain patterns of node-and-arc diagram should be written in HTML. The mappings are intended to be reversible, ie the given pieces of HTML should be mapped back into nodes and arcs using the same rules. In most cases, a program performing this mapping requires knowledge of the particular elements and qualifiers being mapped.

6.2   Mapping of Annotation Nodes

[[Question: Does this apply to all annotation nodes (Type, Role, ...) or only to Type?]]

Given a DC element described by:

   [Resource] --DC:Element--> [ANode] --RDF:Value--------> "Primary value"
                                 |
                                  ------DC:Element.Type--> "Annotation"

   (where "ANode" stands for "Annotation Node")

the following is the recommended encoding in HTML:

   <meta name="DC.Element.Annotation" content="Primary value">

Example:

   [Resource] --DC:Date--> [ANode] --RDF:Value-----> "2000-01-01"
                              |
                               ------DC:Date.Type--> "Issued"

The following is the recommended encoding in HTML:

   <meta name="DC.Date.Issued" content="2000-01-01">

Note that the string "Issued" is the value of a property type in the RDF version, but is part of the name of the META element in the HTML version.

6.2.1   Rationale

HTML does not of course support multiple nested nodes and values, so the RDF version cannot be mapped directly onto the HTML version. One option would have been to abandon the nesting in RDF:

   DEPRECATED EXAMPLE
   [Resource] --DC:Date.Issued--> "2000-01-01"

This was not done because it would force the programs that process this data to parse the property names in order to do intelligent matching of the data. For example, it is desirable that a search engine should return the above resource when asked to find anything with a DC:Date value of "2000-01-01". The search engine could only do this if it has been programmed to parse "DC:Date.Issued" to see the "DC:Date" prefix, or if it has been provided with a list of specialised elements, ie it has been told that "DC:Date.Issued" is a special case of "DC:Date".

Any program that maps between the RDF and HTML representations will have to understand this relationship, but it was thought sensible not to burden RDF metadata in general with this problem.

6.3   Mapping of Aspect Nodes

6.3.1   Mapping of Aspect Nodes with DC properties

There are, as yet, no aspect nodes with DC properties. When or if such properties are defined, we would expect them to be mapped to HTML like this:

  [Resource] -----DC:Splunge-----> [#node001]
  [#node001] --+--DC:Splunge.Foo-> "Yellow"
               +--DC:Splunge.Bar-> "42"

The following is the recommended encoding in HTML:

   <meta name="DC.Splunge.Foo"   content="Yellow">
   <meta name="DC.Splunge.Bar"   content="42">

Note that in this case there are no strings which migrate from being the value of a property in RDF to being in the name of the META element in HTML.

Note that the HTML names are not created simply by concatenating the property types to give "DC.Splunge.Splunge.Foo" etc. Some special processing is needed to remove the duplicated "Splunge".

This mapping suffers from there being no grouping mechanism in HTML, so that we cannot associate the Foo and Bar together in cases where there is more than one Splunge.

6.3.2   Mapping of Aspect Nodes with foreign properties

What do we do with subelements that are not from the DC namespace?

  [Resource] -----DC:Creator-----> [#node001]
  [#node001] --+--VC:FN----------> "Mr. John Q. Public, Esq."
               +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"

The following is the recommended encoding in HTML:

   <meta name="DC.Creator.FN"    content="Mr. John Q. Public, Esq.">
   <meta name="DC.Creator.EMAIL" content="jqpublic@xyz.dom1.com">

[[or perhaps:

   <meta name="DC.Creator"
      content="BEGIN:VCARD
               N:Public;John;Quinlan;Mr.;Esq.
               EMAIL:jqpublic@xyz.dom1.com
               END:VCARD">]]

Note that in this case there are no strings which migrate from being the value of a property in RDF to being in the name of the META element in HTML.

Note that this example deliberately does not use the "N" (Name) property, which is the subject of a later section.

This mapping suffers from there being no grouping mechanism in HTML, so that we cannot associate the name and email address together in cases where there is more than one Creator.

6.3.3   Rationale

6.3.3.1   Grouping

The AHDS/UKOLN project [*** ref ***] uses a grouping mechanism like this:

   <meta name="DC.Creator.FN.1"    content="Mr. John Q. Public, Esq.">
   <meta name="DC.Creator.EMAIL.1" content="jqpublic@xyz.dom1.com">
   <meta name="DC.Creator.FN.2"    content="John Smith">
   <meta name="DC.Creator.EMAIL.2" content="john@smith.com">

This has the advantage that the grouping of the properties is precisely defined, though how this would be used in searching is not clear (I couldn't find a discussion of this grouping mechanism in the text of the AHDS/UKOLN document). It has the disadvantage of being more complicated and needing special software support.

6.3.3.2   Namespaces

In the example above, we lose the information that FN and EMAIL come from the vCard namespace. Perhaps that is OK. If we wrote DC.Creator.VC.FN, a program would still not be able to analyse the name into its parts without prior knowledge of the strings that it is going to encounter.

It would be feasible to provide lists of strings to a metadata processor to tell it that, say, FN is from the vCard namespace.

6.4   Mapping of Schemes

[[How do we do schemes? This is a simple way, but is it too simple? When mapping from HTML to RDF, how does the program know to use DC:Scheme and not XX:Scheme? Perhaps these mappings only apply to DC elements, and they always use DC:Scheme.]]

  [Resource] -----DC:Subject---------> [#node001]
  [#node001] --+--RDF:Value----------> "Cookies"
               +--DC:Subject.Scheme--> "LCSH"

maps to:

   <meta name="DC.Subject" scheme="LCSH" content="Cookies">

6.5   Mapping of vCard "Name" Property

[[Not sure about this one. Worth thinking about. It's kind of like treating "N" the same as "RDF:Value".]] Note that these mappings are intended to apply in both directions (RDF -> HTML and HTML -> RDF).

The vCard "Name" property (identified by the property name "N") is special in that it is mandatory in a vCard, so as to facilitate collating and sorting of vCard objects.

When mapped onto HTML, this should become the value of an unqualified DC element, and vice versa. This only applies if other vCard properties are also present. If there are no other vCard properties present, the HTML version has no indication of any connection with vCard, so the mapping to "N" cannot apply.

Example 1:

  [Resource] -----DC:Creator-----> [#node001]
  [#node001] --+--VC:N-----------> "Public;John;Quinlan;Mr.;Esq."
               +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"

The following is the recommended encoding in HTML:

   <meta name="DC.Creator"       content="Public;John;Quinlan;Mr.;Esq.">
   <meta name="DC.Creator.EMAIL" content="jqpublic@xyz.dom1.com">

Example 2:

  [Resource] -----DC:Creator-----> "Public;John;Quinlan;Mr.;Esq."

The following is the recommended encoding in HTML:

   <meta name="DC.Creator"       content="Public;John;Quinlan;Mr.;Esq.">

6.6   Personal and Corporate Names

[[What about this?]]

  [Resource] -----DC:Creator-----> [#node001]
  [#node001] --+--VC:FN----------> "Mr. John Q. Public, Esq."
               +--VC:X-DC-TYPE---> "Personal"
               +--VC:ADR---------> ";;123 Cliff Ave.;Big Town;CA;97531;US"

Following the pattern above, we get this:

   <meta name="DC.Creator.FN"        content="Mr. John Q. Public, Esq.">
   <meta name="DC.Creator.X-DC-TYPE" content="Personal">
   <meta name="DC.Creator.ADR"       content=";;123 Cliff Ave.;Big Town;CA;97531;US">

6.7   Annotations with Subelements

Each mapping to HTML that we have considered so far has included at most one annotation (DC.Date.Issued) or one subelement (DC.Creator.FN). In the interest of simplicity, it is illegal to include more than that, ie two annotations, or two subelements, or one of each.

We say that if the underlying metadata syntax doesn't support structured values, then you are limited to *either* a single annotation or a single structure subelement. If you want more then you can encode the information into the value. If this makes it hard to extract information from the value then you can either:

  1. Define a BNF grammar controlling the value
  2. Use a structured metadata syntax

6.8   Mapping of HREFs

Element values that are nodes whose properties are not known should be expressed using the URI as a string.

Example:

  [Resource] -----DC:Creator-----> [urn:guid:123456789]

expressed in RDF serialization syntax thus:

   <RDF:Description RDF:HREF="http://a.com/mydoc">
      <DC:Creator RDF:HREF="urn:guid:123456789"/>
   </RDF:Description>

would be expressed in HTML thus:

   <meta name="DC.Creator" content="urn:guid:123456789">

[[Should we use scheme="URI"?]]

7   Other Issues

7.1   Mixed language content in DC elements

The values of Dublin Core elements are typically strings of text. When necessary, those strings may be from multiple languages, eg (taken from http://www.cityvu.com/english/manet30.htm):

   <DC:Title>THE PHOTOGRAPHS - THE MANET COLLECTION:
       Le Déjeuner sur l'herbe</DC:Title>

Sophisticated systems are encouraged to mark the language of all string components, eg:

   <DC:Title xml:lang="en">THE PHOTOGRAPHS - THE MANET COLLECTION:
       <x:span xml:lang="fr">Le Déjeuner sur l'herbe</x:span></DC:Title>

Noting the language of substrings allows more precise searches, eg "chat" in English vs "chat" in French.

The DC community requests that the W3C XML WG and the W3C RDF Model and Syntax WG collaborate to make such support for mixed-language strings a practical reality.

8   Acknowledgements

We would like to thank all the members of the DC Data Model working group for their contributions. We are also grateful for the many suggestions from the subscribers to the meta2 mailing list, from some of which we have taken whole sentences.

9   References

For the moment, please see the references section of the Data Model WG issues list (copy below).

9.1   Additional references

URI -- Uniform Resource Identifier -- see RFC 1630, RFC 1737

9.2   Data Model WG issues list references section

Here is a copy of the References from version 110.

BIB-1xx Bibliographic Formats and Standards -- 1xx Fields [manual page], http://www.oclc.org/oclc/bib/1.htm
DC Dublin Core Web site, http://purl.oclc.org/docs/metadata/dublin_core
DC-Date DC Date WG report, http://purl.oclc.org/docs/metadata/dublin_core/wdatedraft.html
DC-Relation DC Relations WG report, http://purl.oclc.org/docs/metadata/dublin_core/wrelationdraft.html
DC-RFC#1 Dublin Core Metadata for Simple Resource Discovery [Internet Draft], http://ds.internic.net/internet-drafts/draft-kunze-dc-02.txt
DC-5 DC-5: The Helsinki Metadata Workshop [article in D-Lib], http://www.dlib.org/dlib/february98/02weibel.html
HTML HTML 4.0 Specification [W3C Recommendation], http://www.w3.org/TR/REC-html40
IANA-charsets IANA register of charsets, http://www.isi.edu/in-notes/iana/assignments/character-sets
ISBN Using Existing Bibliographic Identifiers as Uniform Resource Names [RFC], http://ds.internic.net/rfc/rfc2288.txt
Knight-Hamilton Dublin Core Qualifiers [Draft], http://www.roads.lut.ac.uk/Metadata/DC-Qualifiers.html
LoC-Browse Browse: Books Catalogued Since 1975 [LoC online menu], http://lcweb.loc.gov/catalog/browse/bks3.html
RDF-M&S Resource Description Framework (RDF) Model and Syntax [latest Working Draft], http://www.w3.org/TR/WD-rdf-syntax
RDF-Schemas Resource Description Framework (RDF) Schemas [latest Working Draft], http://www.w3.org/TR/WD-rdf-schema
vCard-IETF vCard MIME Directory Profile [Internet Draft], http://ds.internic.net/internet-drafts/draft-ietf-asid-mime-vcard-04.txt
vCard-2.1 vCard, The Electronic Business Card, Version 2.1 [Specification], http://www.imc.org/pdi/vcard-21.doc
XML Extensible Markup Language (XML) 1.0 [W3C Recommendation], http://www.w3.org/TR/REC-xml
UUID UUIDs and GUIDs [Internet Draft], http://ds.internic.net/internet-drafts/draft-leach-uuids-guids-01.txt
UUID-URI The uuid: URI scheme [Internet Draft], http://ds.internic.net/internet-drafts/draft-kindel-uuid-uri-00.txt

10   [[Temporary "Notes" Section

Notes I don't want to lose. Maybe incorporate later.

For *some* properties it is a matter of choice and preference whether the information goes in the box or on the label on the box.

Notes: Advantage of inventing new domain-specific qualifiers: more accurate searching. Disadvantage: reduced interoperability between domains.

10.1   The "1:1" Rule

Here is some background information about the "1:1" Rule, which has affected some of the decisions presented here.

At DC-5, in Helsinki, we agreed that metadata about a given resource should not contain metadata about some other resource. As HTML's Meta element makes it very difficult to construct, and refer to, distinct metadata descriptions, we agreed to relax this rule for DC-in-HTML and allow the Source element to contain metadata about the source of the given resource.

10.2   The Meaning of Properties

The fundamental principle for encoding Dublin Core metadata is that the structure given to the metadata, in terms of properties and property values, should match the intended meaning of the metadata.

Here is an example where the structure does not match the meaning very well:

DEPRECATED EXAMPLE
*** I'm sure we will come across one at some point!

11   Section Numbering

And finally, for those of you wondering how I had the patience to number all the sections and sub-sections, here is the answer. It's Perl.

    $min = 2;
    $max = 5;
    while (<>)
    {
        $line = $_;
        if ($line =~ /<h([2-9])>/)
        {
            $num = $1;
            # increment the (sub)section number
            $numbs[$num]++;
            # reset the numbers of lower level headings
            for ($i = $num + 1; $i <= $max; $i++)
            {
                $numbs[$i] = 0;
            }
            # assemble the new number, with dots between
            $numstr = '';
            for ($i = $min; $i <= $num; $i++)
            {
                $numstr .= $numbs[$i] . '.';
            }
            # knock off trailing dot
            $numstr =~ s/.$//;
            # insert into the line, removing existing number if any
            # a nbsp is inserted to make it look better
            # eg "<h3>2.3 &nbsp; Banana Biscuits</h3>"
            $line =~ s/<h$num>[0-9. ]*(&nbsp;)* */<h$num>$numstr &nbsp; /;
        }
        print $line;
    }