[This local archive copy mirrored from the canonical site: http://www.mailbase.ac.uk/lists/dc-datamodel/files/rfc3.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
Dublin Core Workshop Series ??? Internet-Draft ??? draft-???-dcq-00.txt ??? 10 July 1998 ??? Expires in six months ??? Qualified Dublin Core Metadata for Simple Resource Discovery
Status of this Document
This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
Distribution of this document is unlimited. Please send comments to weibel@oclc.org, or to the discussion list meta2@net.lut.ac.uk.
Abstract
This document defines the mechanism by which the Dublin Core metadata element set can be extended by the use of qualifiers, and defines some specific qualifiers.
The qualifiers are designed to be expressed using RDF (Resource Description Framework) [RDF], but their expression in HTML META tags is also defined.
Table of Contents
Unqualified Dublin Core is limited in its sophistication, so this document defines the mechanism by which Dublin Core metadata can be extended by the use of qualifiers.
This document has two purposes. Firstly, it explains how metadata can be expressed in RDF (Resource Description Framework). RDF allows metadata values to be structured, instead of being simple strings as in HTML META tags. One aspect of this structuring is to allow the inclusion of qualifiers, which refine or extend the meaning of the core element. For example it is possible to specify that a Title is an Alternative title.
Secondly, having explained the general principles by which the DC can be qualified, this document goes on to define a number of specific RDF property types which are of general applicability across most of the application domains of the Dublin Core. It is hoped that this will increase the effectiveness of searches of qualified DC across different metadata collections, by encouraging the different metadata authors to use the same property type to express the same information.
Although the structures are discussed here in terms of RDF, they are also applicable to other encodings like generic record syntax in Z39.50.
If the Dublin Core element set is not sufficient to meet the needs of a particular application, it can be extended in various ways.
One way is to define new elements which are outside the DC set. This document mentions namespaces, which enable such elements to be defined with no danger of name clashes, but has nothing more to say on this topic.
Another way is to define qualifiers for DC elements. This document both explains how this should be done, and defines a few particular qualifiers.
At another level, it will be necessary to define additional *values* for the qualifiers defined in this document. In some cases complete lists of values are provided, intended to be extended only in exceptional circumstances (eg RelationType). In other cases this document provides a few example values, and the designers of schemes outside the Dublin Core are expected to define the actual values to be used in practice (eg AgentRole).
[[TBD - define the process by which new values are defined. For each enumerated list of strings (eg Main, Alternative) we must say what the process is by which additional values are blessed (registration, new version of this document, free-for-all...). This includes the definition of the names of schemes.]]
This document defines a number of property types in the Dublin Core Qualifiers namespace. It is not expected that any more property types in that namespace will be defined, except by updating this document. That is, this document can be taken as the complete definition of all the official DC qualifiers.
The qualifiers defined in this document are those which experience has shown to be of general use across application domains, and therefore worth defining as part of the Dublin Core effort. If the definition of these qualifiers were left to the individual communities of DC users, there would be more chance that different names would be chosen for qualifiers with overlapping meanings, which would make it more difficult to do combined searching of metadata originating in more than one community.
The choice of qualifiers is fairly arbitrary. Several of the 15 core elements have no qualifiers defined at all.
The qualifiers defined here are extensible in various ways. If an element, a qualifier or a set of values does not meet your needs it is possible to define a new one.
There are a number of factors to weigh up:
This section presents a very brief introduction to the aspects of the Resource Description Framework which are relevant to this document. For full details, see the RDF Model and Syntax specification [RDF-M&S].
RDF allows the expression of name/value pairs in the same way as HTML META tags, but it also allows more sophisticated structures.
HTML META tags can be mapped into RDF properties. The terminology is different, but the principles are the same. Consider this example:
<meta name="DC.Creator" content="Fred Smith">
This maps into an RDF property, whose property type is "DC:Creator" and whose value is "Fred Smith". In RDF, the word "property" is used to refer to a particular instance of property type and value. For example if a paper has two authors, it would have two properties, both with the same type (DC:Creator), but different values (eg "Fred Smith" and "John Doe").
As described so far, RDF does not offer anything more powerful than HTML META tags. However RDF property values don't have to be strings; the value of a property can also be another resource. Such a resource can be either a real-world resource or an RDF resource whose raison d'etre is simply to have more properties of its own. There will be plenty of examples of this later.
So the value of a DC property could be an RDF resource, which then has properties of its own, whose values can again be strings or more resources. Hence the RDF resources and properties can be thought of as a graph or network, where the nodes are RDF resources and the arcs are the properties.
We can draw our metadata out as a directed labelled graph where the nodes are the RDF resources. The properties are drawn as directed arcs -- that is they have a direction from one node to another. The starting node is the one that represents the RDF resource which has the property, and the arc points to a node that represents the value.
In the RDF specifications, when drawing these node-and-arc diagrams, RDF resource nodes are represented by ellipses and string nodes by rectangles. In this document, due to technical limitations, RDF resource nodes are drawn with square brackets and string nodes in quotation marks like this:
[mydoc] --------DC:Creator-----> "Fred Smith"
This shows an arc from the node "mydoc" pointing to a string "Fred Smith". "DC:Creator" is the label on the arc; the label is the property type. The label is written on the line, with no punctuation.
Instead of a string, the value of a property can be an intermediate node, which comes between the main resource and the main value of the property. In this document, these are called Annotation Nodes, because they are used to attach annotations to the main value.
Example:
[Resource] -----DC:Title-------> [#node001] [#node001] --+--RDF:Value------> "Paris Symphony" +--DCQ:TitleType--> "Alternative"
Here, the title of the work is "Paris Symphony", but we have inserted an annotation node, called #node001, between the main resource and the string "Paris Symphony". The purpose of the annotation node is to carry the annotation which indicates that this is the alternative title of the work.
This is a very powerful mechanism, and enables extra information to be added to the metadata in a way that preserves the relationships between the different parts.
Note the use of the property type "RDF:Value" in the example above. This is an indication that the "real" value of the DC:Title property is actually "Paris Symphony". Programs that process RDF will know to follow such arcs, so that a search for a resource with title "Paris Symphony" will find this resource even though it has an annotation. This is discussed in more detail in the section on matching.
The annotation node does not represent a real resource, but is an RDF resource created specially to hold the annotation. In this document, such nodes are identified by an ID. An ID is a string starting with a "#" that refers to a node defined within the same metadata document (using the word "document" in the sense used by XML [XML]). RDF uses the attribute with name "ID" to give such identifiers to nodes.
Real resources are identified like this "[Resource]". In practice, such resources would be identified by a URI [URI].
The RDF Model and Syntax specification defines a serialization syntax, which is a way of expressing RDF data in a file or on a network as a sequence of bytes. The syntax uses Extensible Markup Language [XML].
Examples are given later.
An alternative to using the node-and-arc diagrams described above is to invent some sort of syntax-neutral notation. That is, a notation independent of both RDF and HTML META tags.
It has been decided not to use such a notation in this document because of the complication involved.
In this section, we look at some general principles that apply to DC metadata in RDF. Later we consider each element in turn, for the specific details that apply to particular elements.
We introduced above the concept of a real-world resource, that is, a resource about which we want to define some metadata. Such a resource would have a Title, Creator etc. We also call these "actual resources". The RDF specifications use the word "resource" to mean any RDF node, including what we call annotation nodes. In RDF, these resources can have properties as easily as real-world resources.
In this document however, we make a distinction between real-world resources and annotation nodes. In particular, real-world resources may take any of the fifteen core elements, but annotation nodes may not.
In RDF, we use only DC element names as the names of properties of nodes which represent actual resources. In other words, at the outermost level, we do not use any property names other than the fifteen DC element names. For example:
this is OK:
[Resource] --DC:Type----->
this is not OK:
[Resource] --DC:Splunge-->
this, of course, is OK:
[Resource] --XY:Splunge-->
An alternative could be to allow other property types at the outermost level. For example, it has been proposed that the distinction between the Creator, Contributor and Publisher elements is unhelpful in qualified DC, because, in qualified DC, we are not restricted to just those roles, but can specify any other role. A cleaner implementation may be to define an Agent element to be used at the outermost level, which would cover Creator, Contributor, Publisher and more, by way of the AgentRole qualifier. For example, these would be equivalent:
[Resource] -----DC:Creator-----> "John Smith" DEPRECATED EXAMPLE [Resource] -----DC:Agent-------> [#node001] [#node001] --+--RDF:Value------> "John Smith" +--DCQ:AgentRole--> "Creator"
You could also have:
DEPRECATED EXAMPLE [Resource] -----DC:Agent-------> [#node001] [#node001] --+--RDF:Value------> "Harold Rosson" +--DCQ:AgentRole--> "majorContributor.cinematographer"
We have decided against this approach because the 15 elements of the core must be a single element set for all Dublin Core metadata, both qualified and unqualified. The use of Agent, as above, would mean that we are using a different set of elements for qualified DC, ie a set including Agent, but excluding Creator, Contributor and Publisher. Using a different element set for qualified DC would make it difficult for users to make searches that return hits from both qualified and unqualified DC: a search for "Agent" would only return resources described using qualified DC, and a search for "Creator" would only return resources described using unqualified DC. One could work round this by changing the search to "(Agent with Role=Creator) OR Creator", or by making the search engine treat these as equivalent by special programming.
In RDF, we do not use the fifteen DC element names except as the names of properties that apply to actual resources. In other words, at inner levels, we do not use the fifteen DC element names to name properties. For example:
this is OK:
[Annotation node] --DCQ:TitleType-->
this is not OK:
[Annotation node] --DC:Type-->
this, of course, is OK:
[Annotation node] --XY:Type-->
We use "DCQ:TitleType" and not "DCQ:Type" because there are several different sorts of Type, which apply to different elements, and which have different legal values. We may want to define these features formally in a machine-readable way, so distinct names are essential.
In addition using "DCQ:TitleType" avoids confusion with the core element "DC:Type".
Comments please
Now we have no dots in our types, what happens to sub-subelements?
We would name sub-subelements as shown below. As yet, no such sub-subelements have been defined in DC. vCard already has provision for the structuring of addresses.
DC:Creator DCQ:CreatorAddress DCQ:CreatorAddressSuburb DCQ:CreatorAddressCountry
It is important to consider the mechanisms by which DC metadata expressed in RDF can be converted to unqualified DC metadata, for passing to a system which can only handle unqualified DC.
By "unqualified DC", we mean metadata which uses only the 15 core elements as property types, and has strings or URIs for all the values.
It is not within the remit of this document to define such a mechanism, but a brief description may help to explain some of the decisions about element structure that have been taken.
The general procedure is to concatenate the string values of all the nodes which make up the value of the element. The result is used as the value of the unqualified DC element. A space is inserted between each pair of strings. The names of the property types are discarded. The values of RDF:Value properties are processed last, but otherwise multiple properties on a single node are processed in an arbitrary order (there are not many examples of this).
If a property has another node as its value, and no properties of that node are known (ie it was defined via a "resource" attribute in the serialization syntax), then the URI of the node is used as the string value. There are actually some problems with knowing where to stop following the web of references, but this conveys the essential principles.
Example 1:
[Resource] -----DC:Subject---------> [#node001] [#node001] --+--RDF:Value----------> "Cookies" +--DCQ:Scheme---------> "LCSH"
This becomes, when degraded to unqualified:
[Resource] -----DC:Subject---------> "LCSH Cookies"
Example 2. The GUID has been shortened to fit on the line.
[Resource] -----DC:Creator-----> [urn:guid:160CD220-0F67-11d2-BC81]
This becomes, when degraded to unqualified:
[Resource] -----DC:Creator-----> "urn:guid:160CD220-0F67-11d2-BC81"
The document "Namespaces in XML" [XML-NS] defines a mechanism for the use of namespaces in XML. All the elements and property types defined by this document are in the Dublin Core Qualifiers namespace, whose namespace name is "http://purl.org/metadata/dublin_core_qualifiers". The namespace prefix "DCQ" is used in this document to denote that namespace.
Users of Dublin Core should not use the DCQ namespace for property types that are not defined in this document. Such extensions should use a namespace which is associated with the person or organization defining the extension, even if they are for use with a Dublin Core element. For example, a new qualifier may be defined to indicate the importance of each creator thus:
[Resource] -----DC:Creator-----> [#node001] [#node001] --+--RDF:Value------> [#node002] +--XX:Creator.Importance-> "Minor" [#node002] --+--VC:FN----------> "Mr. John Q. Public, Esq." +--VC:N-----------> "Public;John;Quinlan;Mr.;Esq." +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"
The new qualifier "Creator.Importance" is not in the DCQ namespace, even though it qualifies a DC element, and begins with the string "Creator". It is up to the owner of the XX namespace to ensure that the string used (eg "Creator.Importance") is not reused within that namespace to mean something different.
There will be a spectrum of matching criteria offered by various RDF systems, ranging from the strictest [a precise match of nodes, arcs and string values] to the most permissive. From the DC perspective, the strictest matching is probably unhelpful, as it would fail to match many cases which most users would consider to be equivalent.
We recommend that systems which carry out matching of DC metadata provide a permissive mode in which an element is matched regardless of whether or not it has annotations. In this permissive mode, when matching DC elements with and without annotations, the matching process should use the primary value of the annotated element. It should locate this value by following RDF:Value arcs. Sophisticated systems may additionally offer stricter matching algorithms.
We also recommend that systems which carry out matching of DC metadata provide a permissive mode in which a value is matched regardless of whether it occurs on its own, or in a Seq, Alt or Bag.
Some qualifiers have default values, for example an unqualified Title is equivalent to a Title of type "Main" under certain circumstances.
When matching a search request against some metadata, this equivalence is not symmetrical.
An unqualified search request should be taken to mean "don't care", ie that the user would like any metadata to match, whether it has a qualifier or not, and regardless of the value of the qualifier. For example a request for "Main Title = Paris" should be taken as different from a request for "Title = Paris".
Unqualified metadata should be taken to be equivalent to the default value when matching against a search request. For example a metadata record with "Main Title = Paris" should satisfy a search in exactly the same cases as a metadata record with "Title = Paris".
This is illustrated in the following table, using Title as the example.
Metadata Search Request Result Reason -------- -------------- ------ ------ Unqualified Unqualified Match Exact Main Unqualified Match Don't care Alternative Unqualified Match Don't care Unqualified Main Match Default Main Main Match Exact Alternative Main No match Different Unqualified Alternative No match Default Main Alternative No match Different Alternative Alternative Match Exact
Hence, there is not much point in putting "Main" into a metadata record for a resource, because it makes no difference to the behaviour of the search engine. However we still have to define "Main" because it is needed for search requests, where the results with and without "Main" are different.
Almost all the DC elements may be meaningfully repeated, for example a resource may have more than one title or author, and hence it may have more than one DC:Title or DC:Creator property. At first thought, one might not expect DC:Description to be repeated, but some resources have both a long and a short description which could both be usefully included.
Some elements are less likely to be repeated, but it is still legal for all of them.
There are four ways of specifying repeated properties in RDF:
We allow the use of all of the above mechanisms for repeating DC elements.
We recommend that systems which carry out matching of DC metadata provide a permissive mode in which an element is matched regardless of whether it occurs singly, or in one of the four forms described above. Sophisticated systems may additionally offer stricter matching (eg, "Find me all resources which have Friedrich Engels as Creator number 2").
We recommend the use of option 1 (simply repeating the element) rather than option 4 (using a Bag), unless the metadata creator has a good reason to use a Bag.
DC RFC #1 has been updated to support ordering.
This section deals with the ordering of repeated elements (eg two Creators), not with the ordering of different elements (eg Creator and Title).
There are situations where the creators of metadata wish to indicate a specific ordering of repeated elements. We shall allow multiple instances of an element to be placed in a specified sequence in environments which support such functionality.
RDF:Value properties should not be repeated. That is, a given node should not have more than one property whose property type is RDF:Value.
[[The "scheme-qualified string" is in doubt. People are so happy with URIs as enumerated values that we have to say string or URI or string-with-scheme.]]
In many cases, the value of an element is stated to be a scheme-qualified string. This means a node whose properties are:
- RDF:Value, whose value is a string, and
- DCQ:Scheme, whose value is the name of a scheme or an RDF node representing a scheme.
Example:
[Resource] -----DC:Subject---------> [#node001] [#node001] --+--RDF:Value----------> "Cookies" +--DCQ:Scheme---------> "LCSH"
This indicates that the subject is "Cookies" as defined by the Library of Congress Subject Headings scheme.
Schemes are intended to provide information which will be helpful in the interpretation of the value of the element. There is a range of applications:
Some schemes are near the border between these applications, for example the Dewey Decimal system, which defines both a structure and a set of values.
Note: schemes are very important to resource discovery although a typical user will not be searching for the actual scheme name. That is, the user is not usually interested in finding something whose metadata uses a particular scheme. The scheme is important for a search engine, for example, to know the format of a date like "11/3/97" in order to know whether to match it with a search query containing "3 Nov 97" or one containing "11 Mar 97".
The value of a DCQ:Scheme property may be either a name or an RDF node. For names, there is a registration procedure to avoid clashes, ie different people using the same name to mean different things.
At any time, the current list of registered values for DCQ:Scheme is held at:
http://purl.org/metadata/dublin_core
That page also explains the procedure for registering new scheme names. There is a mail address to submit requests to, and a public mailing list for discussion, like the registration procedure for MIME types [RFC 2048].
For experimental use, scheme names beginning with "x-" may be used without being registered.
If the value of DCQ:Scheme property is an RDF node, it does not have to be registered. A node is identified by a URI, which by definition is unique.
To avoid typing full URIs in RDF files, XML entities may be used. In this example, the value "456" is a code from the "bn" scheme whose full URI is "http://fruit.org/banana".
<?xml version="1.0"?> <?xml:namespace ns="http://purl.org/metadata/dublin_core" prefix="DC"?> <!DOCTYPE RDF:RDF [<!ENTITY bn "http://fruit.org/banana" >]> <RDF:RDF> <RDF:Description about="http://a.com/mydoc"> <DC:Subject> <RDF:Description> <RDF:Value>456</RDF:Value> <DCQ:Scheme resource="&bn;"/> </RDF:Description> </DC:Subject> </RDF:Description> </RDF:RDF>
Alternatively, the full URI can be simply written out:
<DCQ:Scheme resource="http://fruit.org/banana"/>
A scheme may be used in an HTML META tag to indicate whether multiple concatenated values are separated by commas, semicolons or something else. This doesn't really apply to RDF, where multiple values will usually be represented by multiple properties.
We allow both names and nodes as the values of schemes so as to make it easy for users to simply type in a short string, and also enable more sophisticated users to generate scheme names as required without going through the registration procedure.
It has been suggested that the different applications of schemes listed above should be mapped onto distinct property types, for example DCQ:Scheme and DCQ:Structure, so as to avoid confusion. It also sets a precedent for introducing a new name for a new concept: there may be a requirement to specify the units of a value (eg pounds, kilograms), and it would not be a good idea to use DCQ:Scheme for this. Other people, however, consider that this is unnecessary, as there unlikely to be two schemes of different types with the same name, and a different property type for units can be defined later anyway.
A completely different way of expressing scheme information is like this:
[Resource] -----DC:Subject-----> [#node001] [#node001] -----LOC:LCSH-------> "Cookies"
Here the scheme name is used as the property type for the subject value. An advantage of this over simple names is that the namespace mechanism deals with the problem of name clashes.
Disadvantages are:
We could have used "DCQ:Subject.Scheme" and not "DCQ:Scheme", so that the legal values for the different schemes can be enumerated separately. However, schemes can be used with many property values, and not just with the 15 core elements, which leads to a proliferation of different property types for schemes, for example DCQ:AgentRole.Scheme.
In addition some schemes may be used in association with the value of several different property types. For example, a Date scheme may be used with the Date element and also with the Coverage element, and would be unnecessarily complicated to define DCQ:Date.Scheme as well as DCQ:Coverage.Scheme, with values that overlap.
The value of a DC:Title property must be one of the following:
1. a string holding an actual title, or
2. an intermediate node, whose properties are:
"Main" is the default type. That is, a DC:Title property whose value is a string is equivalent, in certain circumstances, to one with a DCQ:TitleType of "Main". At the lowest level they are different, in that a property with a DCQ:TitleType of "Main" has this property and a plain string does not. But at some higher level these properties should be deemed to be the same. See the section on Matching.
"Alternative" is intended to be used for sub-titles, translated titles, series title etc. It can also be used if the main title contains an acronym: the alternative title would contain the expansion of the acronym.
Here is an example of the use of TitleType to distinguish between two Titles. The unqualified title is the "Main" title by default.
[Resource] --+--DC:Title-------> "Symphony No. 31 in D Major" +--DC:Title-------> [#node001] [#node001] --+--RDF:Value------> "Paris Symphony" +--DCQ:TitleType--> [http://purl.org/metadata/dublin_core/schema#TitleType.Alternative]
The corresponding RDF would look like this:
<RDF:Description about="http://mozart.org/KV297"> <DC:Title>Symphony No. 31 in D Major</DC:Title> <DC:Title> <RDF:Description> <RDF:Value>Paris Symphony</RDF:Value> <DCQ:TitleType resource= "http://purl.org/metadata/dublin_core/schema#TitleType.Alternative" /> </RDF:Description> </DC:Title> </RDF:Description>
We could have defined Type values such as "Main" and "Alternative" as:
A named entity can be used to avoid typing the full URI, eg:
<?xml version="1.0"?> <?xml:namespace ns="http://purl.org/metadata/dublin_core" prefix="DC"?> <!DOCTYPE RDF:RDF [<!ENTITY DC "http://purl.org/metadata/dublin_core/schema" >]> <RDF:RDF> <RDF:Description about="http://a.com/mydoc"> <DC:Title> <RDF:Description> <RDF:Value>Paris Symphony</RDF:Value> <DCQ:TitleType resource="&DC;#TitleType.Alternative"/> </RDF:Description> </DC:Title> </RDF:Description> </RDF:RDF>
Note that the above example uses the abbreviation "DC" for two quite distinct purposes:
One advantage of using the node reference is that there is definitely no need to use a Scheme with it. The value of DCQ:TitleType could be any other node that someone wishes to declare to be a DC title type. There is no danger that two people will accidentally use the same string to mean two different things.
Note
Some cataloging standards treat a translated title on the level of the title itself, whereas others treat it on the level of a sub-title or other additional title. We have made an arbitrary decision, in the interests of interoperability, that translated titles should, in the absence of other reasons, be marked as Alternative.
The Creator, Contributor and Publisher elements have some aspects in common, for example they can all be accompanied by a Role. The common aspects are described here, under Creator.
The value of a DC:Creator property may be a string or a node. If it is a node, it may be an annotation node or some other sort of node (eg a vCard -- see below).
If it is an annotation node, its properties may be:
There is no default value for AgentRole. That is, this document does not define the meaning of an unqualified Creator to be equivalent to a Creator with any particular value of AgentRole. A search request that specifies a particular role should only match metadata records where the Creator has that role. A search request that does not specify a role should match any metadata record regardless of whether a role is present, or what value it has.
Examples:
[Resource] -----DC:Creator------> [#node001] [#node001] --+--RDF:Value-------> [#node002] +--DCQ:AgentRole---> "Illustrator" [#node002] --+--VC:FN-----------> "Mr. John Q. Public, Esq." +--VC:N------------> "Public;John;Quinlan;Mr.;Esq." +--VC:EMAIL--------> "jqpublic@xyz.dom1.com"
[Resource] -----DC:Creator------> [#node001] [#node001] --+--RDF:Value-------> [#node002] +--DCQ:AgentRole---> [#node003] [#node002] --+--VC:FN-----------> "Mr. John Q. Public, Esq." +--VC:N------------> "Public;John;Quinlan;Mr.;Esq." +--VC:EMAIL--------> "jqpublic@xyz.dom1.com" [#node003] --+--RDF:Value-------> "Illustrator" +--DCQ:Scheme------> "USMARC relator term"
[Resource] -----DC:Creator------> [#node001] [#node001] --+--RDF:Value-------> [#node002] +--DCQ:AgentRole---> [http://loc.gov/marc/relators#Illustrator] [#node002] --+--VC:FN-----------> "Mr. John Q. Public, Esq." +--VC:N------------> "Public;John;Quinlan;Mr.;Esq." +--VC:EMAIL--------> "jqpublic@xyz.dom1.com"
The AgentRole property is intended to indicate the role of the creator in a way that is more precise than the definition of the plain Creator element. To clarify the meaning further, here is a list of possible values for the role, taken from [KUNZE-1996]. Some of these are more appropriate as the roles of a Contributor or Publisher.
composer editor librettist photographer translator distributor illustrator mirror publisher
There are many roles for people and institutions in the creation, dissemination and provision of information resources. It may be hard to determine whether a particular person, institution or organization is a Creator, Contributor or Publisher. The usual practice in these circumstances is to use the Creator element for all the roles. Hence users searching for particular contributors or publishers are advised to search under Creator as well as Contributor or Publisher.
The node that represents the person, institution or organization should have properties defined by some broad-based standard, such as vCard [vCard-IETF]. A few vCard properties are shown below.
FN Formatted name eg Mr. John Q. Public, Esq. N Name eg Public;John;Quinlan;Mr.;Esq. ADR Delivery address eg ;;123 Cliff Ave.;Big Town;CA;97531;US EMAIL Email address eg jqpublic@xyz.dom1.com TEL Telephone number eg +1-213-555-1234 ORG Organization Name and Organizational Unit eg ABC, Inc.;North American Division;Marketing
Here is an example of a creator defined using a vCard:
[Resource] -----DC:Creator-----> [#node001] [#node001] --+--VC:FN----------> "Mr. John Q. Public, Esq." +--VC:N-----------> "Public;John;Quinlan;Mr.;Esq." +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"
What do people think of this next bit: yes or no?
It has been suggested that, to assist systems that don't understand the vCard elements, this should be represented thus:
[Resource] -----DC:Creator-----> [#node001] [#node001] --+--RDF:Value------> "Mr. John Q. Public, Esq." +--VC:FN----------> "Mr. John Q. Public, Esq." +--VC:N-----------> "Public;John;Quinlan;Mr.;Esq." +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"
This has the advantage that systems which don't understand the vCard elements can still do approximate matching, so that a query asking for "Creator = Mr. John Q. Public, Esq." would return this resource. It could be argued that the approximate matching should be set up so that, in the absence of an RDF:Value property, all the string-valued properties are considered. Then a search for "Creator = Mr. John Q. Public, Esq." would still return this resource even without the RDF:Value property. In these circumstances, a search for "Creator contains xyz" will also match, which may or may not be what we want.
Some projects are considering the use of OIDs (Object IDs) or GUIDs (Globally Unique IDs) on all objects. People, institutions and organizations would thus have a GUID. No details of the person would appear in the metadata other than the value of the GUID, expressed as a URI (Uniform Resource Identifier).
The GUID in this example has been shortened to fit on the line.
[Resource] -----DC:Creator-----> [urn:guid:160CD220-0F67-11d2-BC81]
expressed in RDF serialization syntax thus:
<RDF:Description about="http://a.com/mydoc"> <DC:Creator resource="urn:guid:160CD220-0F67-11d2-BC81"/> </RDF:Description>
If such metadata is exported to a system that cannot resolve the GUID, it should of course be expanded to include the person's name.
Some existing metadata standards, such as MARC [ref], distinguish between Personal and Corporate names. If such a distinction is needed when generating DC metadata, an additional vCard field X-DC-TYPE should be used, with the value "Personal" or "Corporate".
This shows how to map a MARC Personal name:
[Resource] -----DC:Creator-----> [#node001] [#node001] --+--VC:FN----------> "Mr. John Q. Public, Esq." +--VC:X-DC-TYPE---> "Personal"
Alternatives to "Personal" and "Corporate" have been suggested. The usage above does not fit in well with the fact that "Personal" and "Corporate" are adjectives. Changing them to nouns results in "Person" and "Corporation". However "Corporation" has quite different connotations, so "Organization" was suggested.
One argument in favour of "Personal" and "Corporate" is their existing wide usage in library circles.
The elements Creator, Contributor and Publisher have deep underlying similarities, so the handing of Role has been made uniform across these three elements.
In a number of important environments, greater precision is needed in specifying a role than the distinction provided by these three elements, hence the need for an AgentRole property.
The AgentRole has been defined as an annotation because it is not a property of the person or organization itself, but is rather a property of the relationship (or RDF node) tying together the resource and the person or organization.
There may be some pressure to turn "Role" into "Type" so as to match all the other annotations. Then we would have a uniform pattern like this:
[Resource] -----DC:Xxxxx-------> [#node001] [#node001] --+--RDF:Value------> "foo bar" +--DCQ:XxxxxType--> "qaz"
Note that the string at the end of the property type name on the intermediate node is spelt "Type" in all cases, and not sometimes spelt "Role".
The advantage of this is that it gives more uniformity. The disadvantage is that it warps the meaning of the Role annotation, and will discourage people from inventing other annotations with natural descriptive names. Or, put another way, the designers of other annotations will be discouraged from giving them natural names, and will feel constrained, quite unnecessarily, to call them "Type".
This document defines no new property types for use on values of the Subject element, other than DCQ:Scheme.
Example:
[Resource] -----DC:Subject---------> [#node001] [#node001] --+--RDF:Value----------> "Cookies" +--DCQ:Scheme---------> "LCSH"
The value of the Subject element may be a string or a node. Using a node provides a very clean way of using a controlled vocabulary. Like this:
[Resource] -----DC:Subject-----> [http://loc.gov/LCSH#Cookies]
There is flexibility as to how much of the text relating to the subject needs to appear in the RDF serialization. Consider, for example, this node-and-arc diagram:
[Resource] -----DC:Subject-----> [http://ddc.???/DDC#025.484]
This could be serialized thus:
<DC:Subject> <RDF:Description about="http://ddc.???/DDC#025.484"> <DDC:Class>025.484</DDC:Class> <DDC:Heading>Machine Readable Catalog Record Formats</DDC:Heading> </RDF:Description> </DC:Subject>
or thus:
<DC:Subject resource="http://ddc.???/DDC#025.484"/>
The first option means that the textual name of the classification is present in the metadata record, and therefore can be made available for searching by an index that only stores the data found in the RDF files directly associated with the resources indexed. The second option is shorter and cleaner.
However, in the former case, the index could still be made to find resources based on strings like "Machine Readable Catalog", by feeding to it separate files containing, for example:
<RDF:Description about="http://ddc.???/DDC#025.484"> <DDC:Class>025.484</DDC:Class> <DDC:Heading>Machine Readable Catalog Record Formats</DDC:Heading> </RDF:Description>
Only one entry like this is needed for each classification, however many resources come under this classification.
There are many different views on how RDF is going to be passed around between systems, and the best practice in this area depends on those views.
No DC qualifiers.
The Contributor element has a structure identical to DC:Creator. That is, it can take a DCQ:AgentRole.
The Contributor element has a structure identical to DC:Creator and DC:Publisher. That is, it can take a DCQ:AgentRole.
The value of a DC:Date property must be one of the following:
The legal values of DCQ:DateType are currently as follows. See below for their definitions.
http://purl.org/metadata/dublin_core/schema#DateType.Created http://purl.org/metadata/dublin_core/schema#DateType.Issued http://purl.org/metadata/dublin_core/schema#DateType.Accepted http://purl.org/metadata/dublin_core/schema#DateType.Available http://purl.org/metadata/dublin_core/schema#DateType.Acquired http://purl.org/metadata/dublin_core/schema#DateType.DataGathered http://purl.org/metadata/dublin_core/schema#DateType.Valid
[[Note: I have changed the order to match the *text* of the Date WG report and not the *table*. The columns of the table are monotonic, but the order in the text is not the order obtained by reading across the rows.]]
Note: Both the date itself and the DateType can be qualified by schemes. The date itself can in principle be qualified by DCQ:Scheme, for example to indicate that the format is not ISO 8601, though we can't think of any circumstances in which this would be desirable. The DateType can be qualified by a scheme if it is required to use strings as the type values instead of URIs.
The two cases are illustrated by the following node-and-arc diagrams:
[Resource] -----DC:Date--------> "1998-03-31" [Resource] -----DC:Date--------> [#node001] [#node001] --+--RDF:Value------> "1998-03-31" +--DCQ:DateType---> [http://purl.org/metadata/dublin_core/schema#DateType.Issued]
The two cases are illustrated by the following RDF Descriptions:
<RDF:Description about="http://www.bananas.org/prices.html"> <DC:Date>1998-03-31</DC:Date> </RDF:Description> <RDF:Description about="http://www.bananas.org/prices.html"> <DC:Date> <RDF:Description> <RDF:Value>1998-03-31</RDF:Value> <DCQ:DateType resource= "http://purl.org/metadata/dublin_core/schema#DateType.Issued" /> </RDF:Description> </DC:Date> </RDF:Description>
Date of creation of the resource.
This is the default value. Use an unqualified Date for the date of the creation of the present resource. Examples include the date that an article was written, a photograph taken, a piece of music composed, or a performance recorded. An HTML file created in 1997 as a transcription of an article written in 1875 could have
Date: 1997
with the date 1875 appearing as the Date on a separate RDF node representing the original article.
Alternatively, a simpler description could include exactly one date as either
Date: 1997
or, depending on the metadata provider's preference,
Date: 1875
If you wished to describe different versions of a resource with one resource description, it would be appropriate to put the creation date of the latest version in Date Created. On the other hand, you might instead choose to describe each version with a separate resource description.
Date of formal issuance (e.g., publication) of the resource.
When an unqualified Date is insufficiently precise, use this type to distinguish a release date that has recognized legal (e.g., copyright) or institutional (e.g., posting of a staff policy change) significance. For example, the description of a work published posthumously might have just "Date: 1997", just "Date: 1948", or both:
Date Issued: 1997 Date Created: 1948
A government file, officially released in 1997, consisting of photographs taken in 1985 of hundreds of meteorite fragments collected in 1952 could be described with the following metadata:
Date Issued: 1997 Date Created: 1985 Date DataGathered: 1952
Date of acceptance (e.g., dissertation or treaty) of the resource.
When an unqualified Date and Date Issued are insufficiently precise, use this type to indicate when the resource was formally adopted by a party that accepts or vouches for it.
Date (often a range) that the resource will become or did become available.
When an unqualified Date and Date Issued are insufficiently precise, use this type to indicate a start, end, or both start and end of a period during which access to the resource was or will be granted. It may be needed to indicate an availability period that will start or end in the future, or did come to an end in the past. For example, a journal collection ranging from 1955 to 1996 may be given as:
Date Available: 1955/1996
Note. The expression of date ranges in RDF will depend on the outcome of work on the handling of data types in XML.
Date of acquistion or accession.
When an unqualified Date and Date Issued are insufficiently precise, use this type to distinguish the time that a resource was acquired or accessioned in the context of a collection to which it belongs or in which it resides. For example,
DC.Title: Treaty of 1645 Date Issued: 1645 Date Accepted: 1646 Date Acquired: 1958
Date of sampling of the information in the resource.
When an unqualified Date and Date Created are insufficiently precise, use this type to distinguish the time of raw data creation as opposed to resource content (e.g., intellectual content) creation, which belongs in Date Created. Examples include the date that a group of weather stations were sampled and a range of times during which radiation measurements were taken. To identify the date when a photograph was taken, Date Created is recommended.
Date (often a range) of validity of the resource.
When an unqualified Date and Date Issued are insufficiently precise, use this type to indicate when the resource content may be considered to hold true. In a somewhat labored example, suppose a public transit system is in the practice of creating a new bus schedule, allowing two weeks for issuance of a print run, allowing two more weeks for printed copies of the schedule to be placed in distribution racks, and finally being required to do so at least one month in advance of drivers switching the timing on their routes. Metadata for such a bus schedule might include all of the following elements:
DC.Description: City Bus Schedule Date Created: 1997-11-01 Date Issued: 1997-11-15 Date Available: 1997-12-01/1998-06-01 Date Valid: 1998-01-01/1998-06-01
The default value is "Created" (that is, a Date with no type should be taken as equivalent to one with a type of "Created" when doing certain sorts of matching).
For example, consider the possible combinations of the metadata for a resource and a search request. An unqualified Date on the resource is taken to be the same as Created, but an unqualified search request is taken to mean "match anything". All types except Created follow the same pattern: only two are shown.
Metadata Search Request Result Reason -------- -------------- ------ ------ Unqualified Unqualified Match Exact Created Unqualified Match Don't care Issued Unqualified Match Don't care Accepted Unqualified Match Don't care Unqualified Created Match Default Created Created Match Exact Issued Created No match Different Accepted Created No match Different Unqualified Issued No match Default Created Issued No match Different Issued Issued Match Exact Accepted Issued No match Different Unqualified Accepted No match Default Created Accepted No match Different Issued Accepted No match Different Accepted Accepted Match Exact
An alternative to the above structure is to define "Created", "Issued", "Accepted", "Available", "Acquired", "DataGathered" and "Valid" as sub-elements of Date. They are written as "DCQ:Date.Created" and "DCQ:Date.Issued" and not "DCQ:Created" in case we want to use "Created" etc in connection with other elements, with different usage.
The node-and-arc diagram would look like this:
[Resource] -----DC:Date---------> [#node001] [#node001] -----DCQ:Date.Issued-> "1998-03-31"
or like this:
[Resource] -----DCQ:Date.Issued-> "1998-03-31"
There are problems with both of these.
The first one has a seemingly pointless intermediate node.
The second has effectively defined a new top level property type, applied directly to a resource. This has the disadvantage of complicating the approximate matching process. We want this metadata to match a query asking for a DC:Date of "1998-03-31", but to do this the matching program would have to either examine the letters inside the property type name "DCQ:Date.Issued", or be pre-programmed with a table that indicates that "DCQ:Date.Issued" is a specialisation of DC:Date.
The reason that we really want to define "Created" etc as sub-elements, is to enable us to write in HTML:
<meta name="DCQ:Date.Issued" content="1998-03-31">
This ability to do this is essential because:
The solution described in this document is to define the encoding in HTML to be just that, as a special case, but to continue to define "Created" etc as the value of "DCQ:DateType" in all other cases. This pushes the matching problem onto the systems that convert to and from the HTML encoding.
Consider also what happens when a resource has two dates, for example the date that a resource was created and the date when it was issued. Using sub-elements, it looks like this:
DEPRECATED EXAMPLE [Resource] -----DC:Date---------> [#node001] [#node001] --+--DCQ:Date.Created-> "1998-03-20" +--DCQ:Date.Issued--> "1998-03-31"
Now the intermediate node doesn't look quite so pointless. The problem however is that the structure of the nodes does not match our intended meaning well. We could define the Date element to be like that, and it would work up to a point, but difficulties would creep in when more subtle processing was needed.
The problem is that the node #node001 does not represent anything very helpful. It represents the dates associated with this resource, and it has two properties whose values are two of the dates.
In contrast, the preferred expression of two dates is this:
[Resource] -----DC:Date---------> [#node001] [#node001] --+--RDF:Value-------> "1998-03-20" +--DCQ:DateType----> "Created" [Resource] -----DC:Date---------> [#node002] [#node002] --+--RDF:Value-------> "1998-03-31" +--DCQ:DateType----> "Issued"
Although more bulky, this is more logical. Each of the nodes #node001 and #node002 represents one of the dates, and the properties belong to those dates. One property (RDF:Value) is the main value of the date, ie the moment in time that it represents. The other property (DCQ:DateType) is an annotation to specify precisely the relationship of this date to the resource in question.
It was stated above that the default type for a Date is "Created". Alternatively we could have chosen not to specify any default. That is, an unqualified date would never be equivalent to any particular type. However, this does not work out very well.
Consider again the possible combinations of the metadata for a resource and a search request. This time, an unqualified Date on the resource is not taken to be the same as any particular type. An unqualified search request is still taken to mean "match anything".
The only difference between this and the previous table is the entry for metadata=Unqualified and request=Created, and the Reasons don't say Default any more.
Metadata Search Request Result Reason -------- -------------- ------ ------ Unqualified Unqualified Match Exact Created Unqualified Match Don't care Issued Unqualified Match Don't care Accepted Unqualified Match Don't care Unqualified Created No match Different Created Created Match Exact Issued Created No match Different Accepted Created No match Different Unqualified Issued No match Different Created Issued No match Different Issued Issued Match Exact Accepted Issued No match Different Unqualified Accepted No match Different Created Accepted No match Different Issued Accepted No match Different Accepted Accepted Match Exact
The problem arises when we try to decide how to describe, say, a typical web page that is written and published on the same day, say 1998-07-10. Obviously we just put an unqualified Date in the metadata, with value 1998-07-10. But when a user wants to find pages created on that day, and seaches for pages with a Date Created=1998-07-10, the page does not match!
If we put Date Created=1998-07-10 in the metadata, then a search for Date Issued=1998-07-10 does not match.
Effectively, without a default, an unqualified date has the meaning "none of the above", ie it is equivalent to a date type different from any of the standard ones. It only matches if the user's search request is unqualified, which matches regardless of the type. Without a default, we rule out the meaningful use of an unqualified Date.
This document defines just one property type for use on values of the Type element, and that is DCQ:Scheme.
Example:
[Resource] -----DC:Type------------> [#node001] [#node001] --+--RDF:Value----------> "Sound.Music" +--DCQ:Scheme---------> "Tennant"
We could use a node reference for the scheme instead of a string, which would eliminate the chance that two people would independently use the same string to refer to two different schemes. In this example, the string "Tennant" has been replaced with a reference to the web page where Roy Tennant's list is maintained. In practice it would be better to use a more stable URI, which be less likely to be reused if someone else takes over the maintenance of the list.
[Resource] -----DC:Type------------> [#node001] [#node001] --+--RDF:Value----------> "Sound.Music" +--DCQ:Scheme---------> [http://sunsite.berkeley.edu/Metadata/types.html]
We could also use a node reference for the value of the type itself, eliminating the need for a scheme. In practice we would use a better URI than the one in this example.
[Resource] --DC:Type--> [http://sunsite.berkeley.edu/Metadata/types.html#Sound.Music]
[[Scheme?]]
In many cases, DC metadata in RDF does not need to use the Identifier element, because RDF has a built-in mechanism for identifying the resource to which any piece of metadata applies. The "about" attribute of the RDF:Description is set to the URI of the resource.
If the metadata accompanies the resource, there is no need to set the "about" attribute of the RDF:Description at all. This may be appropriate in cases when the URI is not known at the time that the metadata is prepared (for example a web page whose final location has not been decided), or where the resource never has a URI (for example search results).
Note that resources which cannot be retrieved via HTTP (eg in a database, or a physical resource) can still be assigned URIs for the purpose of defining metadata. The mechanisms for assigning URIs to physical resources include the generation of URIs from ISBNs or UUIDs.
Whether or not the URI appears on the RDF:Description, the Identifier element can be used to contain another identifier (eg a catalogue number) or an alternative URI.
It may be appropriate to treat the URI of the resource as the value of a DC Identifier element when defining searches. For example one can imagine a search string like:
"DC.Title:banana DC.Creator:fred DC.Identifier:recipe"
The user is looking for a web page whose title contained the word "banana", was written by someone called "fred" and whose URI contained the word "recipe". A page described by the following RDF would match:
<RDF:Description about="http://a.com/recipe/bn.html"> <DC:Title>Banana Biscuits</DC:Title> <DC:Creator>Fred Bloggs</DC:Creator> </RDF:Description>
[[Someone said that the Romeo-and-Juliet example was an unfortunate choice. I don't know if that person suggested an alternative example.]]
The value of a DC:Source property must be one of the following:
The use of a URI is strongly recommended. This enables a connection to be made between the current resource and the source resource, which may be useful in resource discovery. For example, one can search for all resources which quote works by a particular author as a source. This could be done by keying on the title of the work, but this is less reliable, as it depends on the titles being character-for-character identical, or else using some approximate matching technique.
Note that the node representing the source resource may itself have properties. The types of these properties may be any (of the fifteen) DC property types or non-DC property types.
The three cases are illustrated by the following node-and-arc diagrams:
[Resource] --DC:Source--> [http://www.shakespeare.com/romeo-and-juliet] [Resource] --DC:Source--> [#romeo-and-juliet] [Resource] --DC:Source--> "Conversations with my father"
and by the following RDF Descriptions:
<RDF:Description about="http://www.films.com/west.side.story"> <DC:Source resource="http://www.shakespeare.com/romeo-and-juliet"/> </RDF:Description> <RDF:Description about="http://www.films.com/west.side.story"> <DC:Source resource="#romeo-and-juliet"/> </RDF:Description> <RDF:Description about="http://www.novel.org/childhood"> <DC:Source>Conversations with my father</DC:Source> </RDF:Description>
[[To do: expression of Source in HTML]]
[[Note: the name "Source.Type" as used in HTML has a quite different derivation and use from "TitleType", "DateType" etc... Do we want to mention this?]]
[[Scheme]]
The value of a DC:Relation property must be a node with two properties:
The legal values of DCQ:RelationType are currently "IsPartOf", "HasPart", "IsVersionOf", "HasVersion", "IsFormatOf", "HasFormat", "References", "IsReferencedBy", "IsBasedOn", "IsBasisFor", "Requires" and "IsRequiredBy". See the Relations WG report for their definitions. There is no default value.
[[Note: for RFC 3, we need to include the full definitions.]]
This is illustrated by the following node-and-arc diagram:
[Resource] --DC:Relation--> [ANode] --RDF:Value------> [Related resource] | --DCQ:RelationType---> "IsBasedOn"
(where "ANode" stands for "Annotation Node").
and by the following RDF Description:
<RDF:Description about="http://ds.internic.net/internet-drafts/draft-kunze-dc-02.txt"> <RDF:Value resource="http://purl.oclc.org/docs/metadata/dublin_core"/> <DCQ:RelationType>IsBasedOn</DCQ:RelationType> </RDF:Description>
[[TBD]]
[[TBD - defer?]]
This section describes how to express qualified DC in HTML META tags. This is done by showing how certain patterns of node-and-arc diagram should be written in HTML. The mappings are intended to be reversible, ie the given pieces of HTML should be mapped back into nodes and arcs using the same rules. In most cases, a program performing this mapping requires knowledge of the particular elements and qualifiers being mapped.
[[Question: Does this apply to all annotation nodes (Type, Role, ...) or only to Type?]]
Given a DC element described by:
[Resource] --DC:Element--> [ANode] --RDF:Value--------> "Primary value" | ------DCQ:ElementType--> "Annotation" (where "ANode" stands for "Annotation Node")
the following is the recommended encoding in HTML:
<meta name="DC.Element.Annotation" content="Primary value">
Example:
[Resource] --DC:Date--> [ANode] --RDF:Value-----> "2000-01-01" | ------DCQ:DateType--> "Issued"
The following is the recommended encoding in HTML:
<meta name="DC.Date.Issued" content="2000-01-01">
Note that the string "Issued" is the value of a property type in the RDF version, but is part of the name of the META element in the HTML version.
HTML does not of course support multiple nested nodes and values, so the RDF version cannot be mapped directly onto the HTML version. One option would have been to abandon the nesting in RDF:
DEPRECATED EXAMPLE [Resource] --DCQ:Date.Issued--> "2000-01-01"
This was not done because it would force the programs that process this data to parse the property names in order to do intelligent matching of the data. For example, it is desirable that a search engine should return the above resource when asked to find anything with a DC:Date value of "2000-01-01". The search engine could only do this if it has been programmed to parse "DCQ:Date.Issued" to see the "DC:Date" prefix, or if it has been provided with a list of specialised elements, ie it has been told that "DCQ:Date.Issued" is a special case of "DC:Date".
Any program that maps between the RDF and HTML representations will have to understand this relationship, but it was thought sensible not to burden RDF metadata in general with this problem.
There are, as yet, no aspect nodes with DC properties. When or if such properties are defined, we would expect them to be mapped to HTML like this:
[Resource] -----DC:Splunge-----> [#node001] [#node001] --+--DCQ:Splunge.Foo-> "Yellow" +--DCQ:Splunge.Bar-> "42"
The following is the recommended encoding in HTML:
<meta name="DC.Splunge.Foo" content="Yellow"> <meta name="DC.Splunge.Bar" content="42">
Note that in this case there are no strings which migrate from being the value of a property in RDF to being in the name of the META element in HTML.
Note that the HTML names are not created simply by concatenating the property types to give "DC.Splunge.Splunge.Foo" etc. Some special processing is needed to remove the duplicated "Splunge".
This mapping suffers from there being no grouping mechanism in HTML, so that we cannot associate the Foo and Bar together in cases where there is more than one Splunge.
What do we do with subelements that are not from the DC namespace?
[Resource] -----DC:Creator-----> [#node001] [#node001] --+--VC:FN----------> "Mr. John Q. Public, Esq." +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"
The following is the recommended encoding in HTML:
<meta name="DC.Creator.FN" content="Mr. John Q. Public, Esq."> <meta name="DC.Creator.EMAIL" content="jqpublic@xyz.dom1.com">
[[or perhaps:
<meta name="DC.Creator" content="BEGIN:VCARD N:Public;John;Quinlan;Mr.;Esq. EMAIL:jqpublic@xyz.dom1.com END:VCARD">]]
Note that in this case there are no strings which migrate from being the value of a property in RDF to being in the name of the META element in HTML.
Note that this example deliberately does not use the "N" (Name) property, which is the subject of a later section.
This mapping suffers from there being no grouping mechanism in HTML, so that we cannot associate the name and email address together in cases where there is more than one Creator.
The AHDS/UKOLN project [*** ref ***] uses a grouping mechanism like this:
<meta name="DC.Creator.FN.1" content="Mr. John Q. Public, Esq."> <meta name="DC.Creator.EMAIL.1" content="jqpublic@xyz.dom1.com"> <meta name="DC.Creator.FN.2" content="John Smith"> <meta name="DC.Creator.EMAIL.2" content="john@smith.com">
This has the advantage that the grouping of the properties is precisely defined, though how this would be used in searching is not clear (I couldn't find a discussion of this grouping mechanism in the text of the AHDS/UKOLN document). It has the disadvantage of being more complicated and needing special software support.
In the example above, we lose the information that FN and EMAIL come from the vCard namespace. Perhaps that is OK. If we wrote DC.Creator.VC.FN, a program would still not be able to analyse the name into its parts without prior knowledge of the strings that it is going to encounter.
It would be feasible to provide lists of strings to a metadata processor to tell it that, say, FN is from the vCard namespace.
[[How do we do schemes? This is a simple way, but is it too simple? When mapping from HTML to RDF, how does the program know to use DC:Scheme and not XX:Scheme? Perhaps these mappings only apply to DC elements, and they always use DC:Scheme.]]
[Resource] -----DC:Subject---------> [#node001] [#node001] --+--RDF:Value----------> "Cookies" +--DCQ:Subject.Scheme--> "LCSH"
maps to:
<meta name="DC.Subject" scheme="LCSH" content="Cookies">
[[Not sure about this one. Worth thinking about. It's kind of like treating "N" the same as "RDF:Value".]] Note that these mappings are intended to apply in both directions (RDF -> HTML and HTML -> RDF).
The vCard "Name" property (identified by the property name "N") is special in that it is mandatory in a vCard, so as to facilitate collating and sorting of vCard objects.
When mapped onto HTML, this should become the value of an unqualified DC element, and vice versa. This only applies if other vCard properties are also present. If there are no other vCard properties present, the HTML version has no indication of any connection with vCard, so the mapping to "N" cannot apply.
Example 1:
[Resource] -----DC:Creator-----> [#node001] [#node001] --+--VC:N-----------> "Public;John;Quinlan;Mr.;Esq." +--VC:EMAIL-------> "jqpublic@xyz.dom1.com"
The following is the recommended encoding in HTML:
<meta name="DC.Creator" content="Public;John;Quinlan;Mr.;Esq."> <meta name="DC.Creator.EMAIL" content="jqpublic@xyz.dom1.com">
Example 2:
[Resource] -----DC:Creator-----> "Public;John;Quinlan;Mr.;Esq."
The following is the recommended encoding in HTML:
<meta name="DC.Creator" content="Public;John;Quinlan;Mr.;Esq.">
[[What about this?]]
[Resource] -----DC:Creator-----> [#node001] [#node001] --+--VC:FN----------> "Mr. John Q. Public, Esq." +--VC:X-DC-TYPE---> "Personal" +--VC:ADR---------> ";;123 Cliff Ave.;Big Town;CA;97531;US"
Following the pattern above, we get this:
<meta name="DC.Creator.FN" content="Mr. John Q. Public, Esq."> <meta name="DC.Creator.X-DC-TYPE" content="Personal"> <meta name="DC.Creator.ADR" content=";;123 Cliff Ave.;Big Town;CA;97531;US">
Each mapping to HTML that we have considered so far has included at most one annotation (DC.Date.Issued) or one subelement (DC.Creator.FN). In the interest of simplicity, it is illegal to include more than that, ie two annotations, or two subelements, or one of each.
We say that if the underlying metadata syntax doesn't support structured values, then you are limited to *either* a single annotation or a single structure subelement. If you want more then you can encode the information into the value. If this makes it hard to extract information from the value then you can either:
Element values that are nodes whose properties are not known should be expressed using the URI as a string.
Example:
[Resource] -----DC:Creator-----> [urn:guid:123456789]
expressed in RDF serialization syntax thus:
<RDF:Description about="http://a.com/mydoc"> <DC:Creator resource="urn:guid:123456789"/> </RDF:Description>
would be expressed in HTML thus:
<meta name="DC.Creator" content="urn:guid:123456789">
[[Should we use scheme="URI"?]]
The values of Dublin Core elements are typically strings of text. When necessary, those strings may be from multiple languages, eg (taken from http://www.cityvu.com/english/manet30.htm):
<DC:Title>THE PHOTOGRAPHS - THE MANET COLLECTION: Le Déjeuner sur l'herbe</DC:Title>
Sophisticated systems are encouraged to mark the language of all string components, eg:
<DC:Title xml:lang="en">THE PHOTOGRAPHS - THE MANET COLLECTION: <x:span xml:lang="fr">Le Déjeuner sur l'herbe</x:span></DC:Title>
Noting the language of substrings allows more precise searches, eg "chat" in English vs "chat" in French.
The DC community requests that the W3C XML WG and the W3C RDF Model and Syntax WG collaborate to make such support for mixed-language strings a practical reality.
We would like to thank all the members of the DC Data Model working group for their contributions. We are also grateful for the many suggestions from the subscribers to the meta2 mailing list, from some of which we have taken whole sentences.
For the moment, please see the references section of the Data Model WG issues list (copy below).
URI -- Uniform Resource Identifier -- see RFC 1630, RFC 1737
Here is a copy of the References from version 110.
BIB-1xx | Bibliographic Formats and Standards -- 1xx Fields [manual page], http://www.oclc.org/oclc/bib/1.htm |
DC | Dublin Core Web site, http://purl.oclc.org/docs/metadata/dublin_core |
DC-Date | DC Date WG report, http://purl.oclc.org/docs/metadata/dublin_core/wdatedraft.html |
DC-Relation | DC Relations WG report, http://purl.oclc.org/docs/metadata/dublin_core/wrelationdraft.html |
DC-RFC#1 | Dublin Core Metadata for Simple Resource Discovery [Internet Draft], http://ds.internic.net/internet-drafts/draft-kunze-dc-02.txt |
DC-5 | DC-5: The Helsinki Metadata Workshop [article in D-Lib], http://www.dlib.org/dlib/february98/02weibel.html |
HTML | HTML 4.0 Specification [W3C Recommendation], http://www.w3.org/TR/REC-html40 |
IANA-charsets | IANA register of charsets, http://www.isi.edu/in-notes/iana/assignments/character-sets |
ISBN | Using Existing Bibliographic Identifiers as Uniform Resource Names [RFC], http://ds.internic.net/rfc/rfc2288.txt |
Knight-Hamilton | Dublin Core Qualifiers [Draft], http://www.roads.lut.ac.uk/Metadata/DC-Qualifiers.html |
LoC-Browse | Browse: Books Catalogued Since 1975 [LoC online menu], http://lcweb.loc.gov/catalog/browse/bks3.html |
RDF-M&S | Resource Description Framework (RDF) Model and Syntax [latest Working Draft], http://www.w3.org/TR/WD-rdf-syntax |
RDF-Schemas | Resource Description Framework (RDF) Schemas [latest Working Draft], http://www.w3.org/TR/WD-rdf-schema |
vCard-IETF | vCard MIME Directory Profile [Internet Draft], http://ds.internic.net/internet-drafts/draft-ietf-asid-mime-vcard-04.txt |
vCard-2.1 | vCard, The Electronic Business Card, Version 2.1 [Specification], http://www.imc.org/pdi/vcard-21.doc |
XML | Extensible Markup Language (XML) 1.0 [W3C Recommendation], http://www.w3.org/TR/REC-xml |
UUID | UUIDs and GUIDs [Internet Draft], http://ds.internic.net/internet-drafts/draft-leach-uuids-guids-01.txt |
UUID-URI | The uuid: URI scheme [Internet Draft], http://ds.internic.net/internet-drafts/draft-kindel-uuid-uri-00.txt |
Notes I don't want to lose. Maybe incorporate later.
For *some* properties it is a matter of choice and preference whether the information goes in the box or on the label on the box.
Notes: Advantage of inventing new domain-specific qualifiers: more accurate searching. Disadvantage: reduced interoperability between domains.
Here is some background information about the "1:1" Rule, which has affected some of the decisions presented here.
At DC-5, in Helsinki, we agreed that metadata about a given resource should not contain metadata about some other resource. As HTML's Meta element makes it very difficult to construct, and refer to, distinct metadata descriptions, we agreed to relax this rule for DC-in-HTML and allow the Source element to contain metadata about the source of the given resource.
The fundamental principle for encoding Dublin Core metadata is that the structure given to the metadata, in terms of properties and property values, should match the intended meaning of the metadata.
Here is an example where the structure does not match the meaning very well:
DEPRECATED EXAMPLE *** I'm sure we will come across one at some point!
And finally, for those of you wondering how I had the patience to number all the sections and sub-sections, here is the answer. It's Perl.
$min = 2; $max = 5; while (<>) { $line = $_; if ($line =~ /<h([2-9])>/) { $num = $1; # increment the (sub)section number $numbs[$num]++; # reset the numbers of lower level headings for ($i = $num + 1; $i <= $max; $i++) { $numbs[$i] = 0; } # assemble the new number, with dots between $numstr = ''; for ($i = $min; $i <= $num; $i++) { $numstr .= $numbs[$i] . '.'; } # knock off trailing dot $numstr =~ s/.$//; # insert into the line, removing existing number if any # a nbsp is inserted to make it look better # eg "<h3>2.3 Banana Biscuits</h3>" $line =~ s/<h$num>[0-9. ]*( )* */<h$num>$numstr /; } print $line; }