[Mirrored from: http://207.201.154.232/murray/specs/WD-pics-ng-metadata-970514.html]
Ora Lassila, lassila@w3.org, Nokia Research Center (currently visiting W3C)
Version 3.5, 5/14/97
This document supersedes an earlier document titled "PICS-NG Label Syntax Proposal" version 1 dated 2/20/97 [Lassila 97].
This document would not have been possible without substantial contributions and support from Ralph Swick (W3C), as well as contributions and comments from Eric Miller (OCLC), Jim Miller (W3C), Paul Resnick (AT&T) and Bob Schloss (IBM). The author is indebted to all these people for their continuing moral support.
This document is being submitted simultaneously to the W3C (PICS) Label Working Group and the W3C DSig Collections Working Group for consideration as the basis of a converged web resource description framework. It represents the discussion of W3C staff and has not yet undergone review by either of those groups. Note also that Section 5.2 is currently missing and will be provided in an update very shortly.
The first question to ask is: what is metadata? Metadata is "data about data", or specifically in our present context, "data about web resources."
The broad goal is to define a metadata mechanism which makes no assumptions about a particular application domain, nor defines the semantics of any application domain. The definition of the mechanism should be domain neutral, yet the mechanism should be suitable for describing information about any domain.
Metadata can be used in a variety of application areas; for example: in resource discovery to provide better search engine capabilities, in cataloging for describing the content available at a particular web site or page, by intelligent software agents to facilitate knowledge sharing and exchange, in digital signatures, in content rating, and in many others (for example, metadata can be used for specialized tasks such as organizing a group of web pages for purposes of printing them as a single unit, or for producing a visualization of the link relationships between them).
This document introduces an model for representing metadata, and a syntax for expressing and transporting metadata based on this model. In a way, this is a new version of the PICS content rating label mechanism and motivates its use as a general metadata description formalism. The new PICS - which we shall here call "PICS-NG" (for "Next Generation") - is based on a conceptual object model for metadata, suitable for expressing information about web resources as well as other PICS-NG formulations. The model is highly extensible, and also more general than the implied model behind PICS version 1.1 [Krauskopf 96]; hence this document will first describe the model in general and then proceed to give a specialization for implementing content rating labels.
A mechanism is needed to permit encoding and transport of web metadata in a manner that maximizes the interoperability of independently developed web servers and clients. Specific applications are free - and indeed encouraged - to impose additional semantics on a subset of the metadata above that required by the model described in this document.
The metadata object model defines a conceptual framework for objects called labels. Labels are collections of attributes and their corresponding values. The domain of values consists of instances of a small set of primitive types, other labels, as well as lists. The primitive types are: strings, numbers (both integers and floats) and booleans. By definition, an attribute/value pair contained in a label makes a statement. Using labels it is possible to make statements about resources (which have a URL) as well as about other labels.
The set of attributes for a given label, as well as any characteristics or restrictions of the values themselves, are defined by a schema, referred to by the label using a URL. This URL may be treated merely as an identifier or it may refer to a machine-readable description of the schema. A label may have more than one schema, and similarly a schema may be defined in terms of any number of other schemata. By definition, an application that understands a particular schema used by a label understands the semantics of each of the attribute statements contained in that label. An application that has no knowledge of the particular schema will minimally be able to parse the label into the attribute and value components and will be able to transport the label intact (e.g. to a cache or to another application). In the presence of multiple schemata, an application may choose (in a left-to-right order) the first schema it has knowledge of, and interpret the label using that schema [Note: see the Open Issues section for a discussion on multiple inheritance].
An actual machine-readable description of a schema may be accessed through content negotiation by dereferencing the schema URL contained in the label. If the schema is machine-readable it may be possible for an application to learn the semantics of the schema on demand. How the learning happens is beyond the scope of this document; furthermore, no claim is made that it is always feasible to encode the full semantics in a machine-readable schema. The URL referring to a schema may actually refer to a file containing definitions for several schemata (i.e. a library of schemata). In this case, embedded labels may refer to any of the contained schemata definitions using URL fragment identifiers.
A type is an identifier designated by a schema to name a component of a type system. The basic type system of PICS-NG contains the following types (many of these types are not unlike those found in various Lisp systems):
Type | Description |
---|---|
string | A sequence of characters [Note: a discussion of character sets will be included in a future version of this document]. Syntactically strings are case-sensitive and may contain whitespace. |
symbol | A sequence of characters acting as a unique identifier. Syntactically symbols are case-insensitive. The particular syntax used for metadata restricts the set of characters allowed in a symbol. Furthermore, symbols may not contain whitespace. |
integer | An integer. |
float | A floating-point approximation of a real number. |
range | A tuple of two numbers, representing lower and upper bounds of an interval. |
number | Either an integer, a float, or a range. |
boolean | A boolean value. The names of the two possible values are true and false. |
list | An ordered sequence of values (of any type). |
label | A PICS-NG metadata label. |
URL | A Uniform Resource Locator. Syntactically this is a string, but only those characters are allowed which are legal as specified in the URL specification [Berners-Lee 94]. |
ISODate | Objects of this type represent points in time. Syntactically the type looks like a string (with additional restrictions on their contents as defined in the syntax section below), internally it may be represented in any way the implementation sees fit. Syntactically this type is currently defined as quoted-ISO-date in the PICS version 1.1 specification. |
In addition, the type called any is understood to denote the set of all of the above types.
In certain applications it may be desirable for some attributes to hold multiple values simultaneously. In this case the order of the values is significant, that is, an application is required to preserve the ordering (please note that the order of attributes is not significant). To assign multiple values to an attribute the list type is used (in other words, an attribute with multiple values is an attribute with a single list value).
A label is a collection of statements (attribute/value pairs). These statements are being made about an object called the referent. We can identify three different types of referents:
If the referent is a list it is understood that the statements are being made of each of the items of the list. The following table clarifies the differences between the three cases based on the type of the referent object:
Referent Value | Indirect Referent Value | Immediate Value | |
---|---|---|---|
boolean | N/A | N/A | The referent label makes statements about the object (e.g. unit of a numerical value) |
symbol, number | The symbol or number is the name of another label which is considered the actual referent (see below) | The symbol or number is the name of another label which is considered the actual referent (see below) | |
string | String is a URL, and the statements apply to the resource at that URL | String is a URL referring to a label describing a set; statements apply to the items of the set | Statements are made about the string (e.g. type of string, language) |
label | If the label defines a set, the statements apply to the set as a whole | If the label defines a set, the statements apply to the elements of the set individually | Statements apply to the referent label object itself |
If x is the referent of y, then we will say that y is the parent of x. In general, the immediately enclosing label of any label is called the parent (regardless of what attribute holds the label as its value).
When label naming is used (in the above table, when the referent is a symbol or a number), the scope of name visibility is within the peers of a named label as well as within the locally (lexically) enclosing labels. References through URLs do not transmit name visibility. In addition, a label is not allowed to make forward references (only labels introduced lexically before the referring label can be referred to), nor is a label allowed to refer to itself.
In order to avoid needless proliferation of metadata a mechanism is introduced which allows the sharing of common fragments of metadata among several labels. This feature is inspired by various inheritance mechanisms found in object-oriented programming systems as well as various knowledge representation systems.
Attributes and their (optional) default values are inherited from a schema to a label, and values may be inherited from one label to another. Using inheritance statements can be made of groups of objects without having to repeat the statement individually for each object. The following algorithm defines the exact mechanism of inheritance: given a label lab and an attribute att, the value of att for lab, as given by the function AttributeValue(lab, att), is
As stated in the model definition section, in the presence of multiple schemata an application may choose the first schema it has knowledge of, and interpret the label using that schema.
A label refers to schema(ta) for the purpose of grounding the terms used by the label, to provide semantics for the statements the label makes. It is our intention that the PICS-NG metadata formalism be extremely simple, yet powerful via extensibility. It is expected that metadata implementors will define new schemata to introduce additional semantics for metadata expressions. We assume a formalism will exist for defining schemata, but this formalism is not described in this document (possibly the same formalism is used for schemata as is used for metadata instances). For maximal extensibility, the schema definition mechanism may take on features of metaobject protocols.
For the purposes of "bootstrapping" the model, it will be necessary to define a small set of attributes which are available in all labels (and which conceivably could be used by any label). These attributes cannot be redefined or overridden by new schemata (to indicate the fixed nature of the definition of these attributes their names start with the * character; the use of the character * is reserved for this purpose and no schema should use it as the first letter of an attribute name). The common core attributes of labels are:
Attribute name | Type | Description |
---|---|---|
*schema | URL, list | Contains a reference to the schema of the label. A list value is understood to be a list of URLs. |
*for *for-indirect *for-immediate |
any (see table in Section 2) | Contains the referent of the label, i.e. the object about which the statements in the label are being made. See the explanation at the end of section 2 describing the three different kinds of referents: a referent value (for), an immediate value (for-immediate) and an indirect referent value (for-indirect). Note: Specialized schemata may define other attributes the values of which can also be considered referents. |
*id | symbol, number | Names the label. Named labels can be referred to by just using their name (see explanation on referents at the end of section 2). The scope of the names is the lexical context of the label (everything within the outermost lexically enclosing label). |
*dsig | label | This attribute holds a digital signature which signs the label. The digital signature is a label itself and thus conforms to this specification. The actual schemata and sematics of digital signatures will be specified later. |
In order to implement an extension of PICS version 1.1 using PICS-NG, a schema has to be defined to introduce the old "options" as label attributes. We will call this schema the "PICS 2.0 Schema." A PICS 2.0 rating label is expressed as a single label. A label-list is a label whose referent is a list of labels. The attributes of the PICS 2.0 schema are:
Attribute | Default value | Type | Description |
---|---|---|---|
at | no default | ISODate | The last modification date of the item to which this rating applies, at the time the rating was assigned. |
by | no default | string | An identifier for the person or entity within the rating service who was responsible for creating this particular rating label. |
generic | false | boolean | If this option is set to true, the rating label can be applied to any URL starting with the prefix given in the for option. This is used to supply ratings for entire sites or any subparts of sites. |
on | no default | ISODate | The date on which this rating was issued. |
until | no default | ISODate | The date on which this rating expires. |
The PICS-NG metadata object model provides an abstract, conceptual framework for defining and using metadata. A concrete syntax is also needed for the purposes of authoring and exchanging metadata. Several syntaxes are obviously possible, and we may not have to limit ourselves to a single syntax. There are, however, certain goals to keep in mind when designing the syntax:
This document defines an s-expression syntax for PICS-NG. This syntax satisfies the above requirements.
The syntax of PICS-NG is greatly simplified from that of PICS version 1.1. Basically PICS-NG syntax in a straightforward manner consists of s-expressions where additional restrictions are placed on the types of values of certain elements of s-expression structures. PICS-NG parsing is a multi-step process. Parsing of a single label happens as follows:
This syntax has been chosen because it is simple to parse, provides a straightforward correspondence between the model and the syntactic form of the data, is brief (good "over the wire" -characteristics), and (by not being too verbose) is easy for humans to read and write. A BNF definition of the overall syntactic structure is given below (despite that fact that BNF rather poorly lends itself to describing s-expressions):
Manifest | :: | '(' Version Label* ')' | |
Version | :: | Symbol | Possibly the version symbol in this version is pics-2.0 |
Label | :: | '(' 'label' Attribute* ')' | |
Attribute | :: | AttributeName Value | |
AttributeName | :: | Symbol | URL | |
Value | :: | Atom | List | Label | |
Atom | :: | String | Symbol | Number | Range | Boolean | |
List | :: | '(' 'list' Value* ')' | |
Range | :: | '(' 'range' Number Number ')' | Note: is this really a general thing or a content rating thing? |
Boolean | :: | 'true' | 'false' |
Here are definitions for the "literal" entities of the syntax:
Symbol | any sequence of characters not containing whitespace nor any of the following characters: ( ) " |
String | defined as quotedname in the PICS 1.1 specification. Basically anything limited by doublequotes. |
Number | defined as [ '+' | '-' ] DigitCharacter* [ '.' DigitCharacter+ ] where DigitCharacter is any of the characters '0'...'9'. |
URL | similar to String, but with contents identifying a Uniform Resource Locator, as defined in the PICS Rating Services and Rating Systems [Miller 96] and RFC 1738 [Berners-Lee 94]. |
ISODate | a String, representing a date but restricted from the ISO standard, as described by the PICS 1.1 specification. |
Another possible approach to metadata syntax is to use XML (the Extensible Markup Language). This language is attractive because of its political appeal and the fact that it may find other uses in the Internet arena. The full definition of an XML syntax for PICS-NG will be included in a future version of this document. See Appendix A for a discussion of PICS-NG compared with Microsoft's XML Collections proposal.
Section 6.2 illustrates a proposed XML syntax [contributed by Andrew Layman, andrewl@microsoft.com]. This recommendation relies on several proposed features of XML which are described in a separate document:
With the above syntax proposals, the XML encoding can be nearly a transliteration of the s-expression encoding. The suggestion has been made to eliminate some of the special tokens (e.g. "label") and use a "reference=" attribute on for rather than three separate for, for-immediate, and for-indirect tokens. These ideas are illustrated in section 6.2.
The following examples show how PICS-NG is used to make certain kinds of statements. Some of the examples are drawn from the PICS 1.1 specification.
[note: these examples are slanted to content filtering; we will rewrite them in a future draft to show other uses.]
Some statement about a single document (URL):
(pics-2.0 (label *schema "http://www.w3.org/authors-and-stuff" *for "http://www.w3.org/People/Lassila/" author "Ora Lassila"))
Some statement about two documents:
(pics-2.0 (label *schema "http://www.w3.org/authors-and-stuff" author "Ora Lassila" *for (list "http://www.w3.org/People/Lassila/" "http://www.w3.org/PICS/draft-lassila-pics-ng-metadata.html")) )
Here are some of the examples from the PICS 1.1 document, modified for the new syntax:
(pics-2.0 (label *schema "http://www.gcf.org/v2.5" by "John Doe" *for-indirect (list (label *for "http://w3.org/PICS/Overview.html" on "1994.11.05T08:15-0500" until "1995.12.31T23:59-0000" suds 0.5 density 0 color/hue 1) (label *for "http://w3.org/PICS/Underview.html" by "Jane Doe" subject 2 density 1 color/hue 1))))
(pics-2.0 (label *schema "http://www.gcf.org/v2.5" *for-indirect (list (label suds 0.5 density 0 color/hue 1) (label subject 2 density 1 color/hue 1))))
A PICS label rating a statement about a URL (that is, the ratings apply to the statement, not the document):
(pics-2.0 (label *schema "http://www.gcf.org/v.2.5" *for-immediate (label *schema "http://www.w3.org/authors-and-stuff" *for "http://www.w3.org/soap.html" author "Ora Lassila") suds 1 density 0 color/hue 0))
The same example as above, except that label naming is used instead of an explicit containment hierarchy:
(pics-2.0 (label *schema "http://www.w3.org/authors-and-stuff" *id foo *for "http://www.w3.org/soap.html" author "Ora Lassila") (label *schema "http://www.gcf.org/v.2.5" *for foo suds 1 density 0 color/hue 0))
A label making use of multiple values and metadata attached to attribute values:
(pics-2.0 (label *schema "http://purl.org/Schemas/description1" *for "http://purl.color.org/document.html" title "Light and Dark: A study of color" subject (label *schema "http://purl.org/Schemas/LCSH" *for-immediate "Color and Color Palates") author (list (label *schema "http://www.foo.com/author" name "John Smith" affiliation "thedarkside" email "john@thedarkside") (label *schema "http://www.foo.com/author" name "Smith, Jane Q." affiliation "thelightregion" email "jane@thelightregion"))))
An example demonstrating how common data can be shared by several labels:
(pics-2.0 (label *schema "http://www.gcf.org/v.2.5" *for-indirect (label *schema "http://purl.org/Schemas/description1" author "Ora Lassila" subject (label *schema "http://purl.org/Schemas/LCSH" *for-immediate "Color and Color Palates") *for (list (label *for "http://www.w3.org/foo" author "Ralph Swick" title "Fundamentals of Foos") (label *for "http://www.w3.org/bar" title "Fundamentals of Bars") (label *for "http://www.w3.org/foobar" title "Foos vs. Bars"))) by "Jim Miller" suds 1.0 density 0.5 hue/color 0.0)))
The labels of the previous example written out so that each of them stands alone (i.e. no sharing of fragments of metadata):
(pics-2.0 (label *schema "http://www.gcf.org/v.2.5" *for (label *schema "http://purl.org/Schemas/description1" *for "http://www.w3.org/foo" subject (label *schema "http://purl.org/Schemas/LCSH" *for-immediate "Color and Color Palates") author "Ralph Swick" title "Fundamentals of Foos") by "Jim Miller" suds 1.0 density 0.5 hue/color 0.0) (label *schema "http://www.gcf.org/v.2.5" *for (label *schema "http://purl.org/Schemas/description1" *for "http://www.w3.org/bar" subject (label *schema "http://purl.org/Schemas/LCSH" *for-immediate "Color and Color Palates") author "Ora Lassila" title "Fundamentals of Bars") by "Jim Miller" suds 1.0 density 0.5 hue/color 0.0) (label *schema "http://www.gcf.org/v.2.5" *for (label *schema "http://purl.org/Schemas/description1" *for "http://www.w3.org/foobar" subject (label *schema "http://purl.org/Schemas/LCSH" *for-immediate "Color and Color Palates") author "Ora Lassila" title "Foos vs. Bars") by "Jim Miller" suds 1.0 density 0.5 hue/color 0.0))
[This section contributed by Andrew Layman, <andrewl@microsoft.com> with some minor editting by Ralph Swick. The examples are equivalent to those in section 6.1 As stated above in section 5.2, these examples rely on several proposed features of XML which are described elsewhere.]
Andrew's comments on the examples:
In the main, names and other characteristics of section 6.1 are used here to make comparison with the s-expression syntax easier, since our main goal is to verify that XML is able to express the same statements as s-expressions can.
Schema shortnames are illustrated in these examples. The shortnames are chosen according the Java conventions for package names. It is overkill for these examples, but shows how one can absolutely avoid any possibility of name conflicts, even as schemas evolve.
The s-expressions examples use an element called "label." Obviously, a label is meant to be the root type of all elements: Devoid of any particular properties or attributes, it can be subclassed to become anything, with subclassing effected by the "*schema" attribute. That is, all labels are really particular kinds of things, identified by their "*schema" attribute. Each schema evidently describes one kind of object. In contrast, in the XML proposal, all elements are explicitly of some particular type, drawn from the namespace of an xml-schema attribute of a parent element. For instance, the first s-expression example has a label of type "http://www.w3.org/authors-and-stuff" (in the s-expression model, element types are URIs). The first XML example introduces an "http://www.w3.org" schema, then draws from it a particular element type, "authors-and-stuff". (This really should be a name meaning "thing with author and other attributes" but I have not changed the names in these examples.)
Some statement about a single document (URL):
<*xml-schema ref="http://www.w3.org" /> <authors-and-stuff for="http://www.w3.org/People/Lassila" author="Ora Lassila" />
Some statement about two documents:
<*xml-schema ref="http://www.w3.org" as="org.w3.www" /> <org.w3.www.authors-and-stuff author="Ora Lassila" > <*for> <thing>http://www.w3.org/People/Lassila</> <thing>http://www.w3.org/pics/draft-lassila-pics-ng-metadata.html& lt;/> </for> </authors-and-stuff>
Here are some examples from the PICS 1.1 Document modified for the new syntax:
<*xml-schema ref="http://www.gcf.org" as="org.gcf.www" /> <org.gcf.www.v2:5 by="John Doe" > <*for reference="indirect"> <thing for="http://w3.org/PICS/Overview.html" on="1994.11.05T08:15-0500" until="1995.12.31T23:59-0000" suds="0.5" density="0" color-hue="1" /> <thing for="http://w3.org/PICS/Underview.html" by="Jane Doe" subject="2" density="1" color-hue="1" /> </for> </v2:5>
<org.gcf.www.v2:5> <*for reference="indirect"> <thing> <v2:5 suds="0.5" density="0" color-hue="1" /> </thing> <thing> <v2:5 subject="2" density="1" color-hue="1" /> </thing> </for> </v2:5>
A PICS label rating a statement about a URL (that is, the ratings apply to the statement, not the referenced document):
<*xml-schema ref="http://www.gcf.org" as="org.gcf.www"> <org.gcf.www.v2:5> <*for reference="immediate"> <*xml-schema ref="http://www.w3.org" as="org.w3.www"> <org.w3.www.authors-and-stuff for="http://www.w3.org/soap.html" author="Ora Lassila" /> </for> <*suds>1</> <*density>0</> <*color hue="0" /> </v2:5>
The same example as above, except that label naming is used instead of containment hierarchy:
<*xml-schema ref="http://www.gcf.org" as="org.gcf.www"/> <*xml-schema ref="http://www.w3.org" as="org.w3.www"/> <org.w3.www.authors-and-stuff id="foo" for="http://www.w3.org/soap.html" author="Ora Lassila" /> <org.gcf.www.v2:5 > <*for>#foo</> <*suds>1</> <*density>0</> <*color hue="0" /> </v2:5>
A label making use of multiple values and metadata attached to attribute values:
<*xml-schema ref="http://purl.org/Schemas" as="org.purl" /> <org.purl.description1 for="http://purl.color.org/document.html" title="Light and Dark: A study of color" > <*subject> <lcsh> <*for reference="immediate">Color and Color Palettes</></> </subject> <*author> <*xml-schema ref="http://www.foo.com" as="com.foo.www" /> <com.foo.www.author name="John Smith" affiliation="thedarkside" email="john@thedarkside" /> <com.foo.www.author name="Smith, Jane Q." affiliation="thelightregion" email="jane@thelightregion" /> </description1>
An example demonstrating how common data can be shared by several labels. (Note: Evidently in this metadata application, attributes of a parent are attributed to each child. Such behavior is probably reasonable for this example and the particular attributes used in it, but would need to be controlled carefully in applications using either default values or subclassing.)
<*xml-schema ref="http://www.gcf.org" as="org.gcf.www"> <org.gcf.www.v2:5> <*for reference="indirect"> <*xml-schema ref="http://purl.org/Schemas" as="org.purl" /> <org.purl.description1 author="Ora Lassila"> <*subject> <lcsh> <*for reference="immediate">Color and Color Palettes</> </lcsh></subject> <*for> <description1 author="Ralph Swick" title="Fundamentals of Foos" /> <description1 for ="http://www.w3.org/bar" title="Fundamentals of Bars" /> <description1 for ="http://www.w3.org/foobar" title="Foos vs. Bars" /> </for> </description1> </for> <*by>Jim Miller</> <*suds>1.0</> <*density>0.5</> <*color hue="0" /> </v2:5>
The labels of the preceding example written out so that each of them stands alone (i.e. no sharing of fragments of metadata):
<*xml-schema ref="http://www.gcf.org" as="org.gcf.www"> <org.gcf.www.v2:5> <*for> <*xml-schema ref="http://purl.org/Schemas" as="org.purl" /> <org.purl.description1 for="http://www.w3.org/foo" > <*subject> <lcsh> <*for reference="immediate">Color and Color Palettes </for> </lcsh></subject> <*author>Ralph Swick</> <*title>Fundamentals of Foos</> </description1> </for> </description1> <*by>Jim Miller</> <*suds>1.0</> <*density>0.5</> <*color hue="0" /> </v2:5> <org.gcf.www.v2:5> <*for> <*xml-schema ref="http://purl.org/Schemas" as="org.purl" /> <org.purl.description1 for="http://www.w3.org/bar" > <*subject> <lcsh> <*for reference="immediate">Color and Color Palettes </for> </lcsh></subject> <*author>Ora Lassila</> <*title>Fundamentals of Bars</> </description1> </for> <*by>Jim Miller</> <*suds>1.0</> <*density>0.5</> <*color hue="0" /> </v2:5><org.gcf.www.v2:5> <*for> <*xml-schema ref="http://purl.org/Schemas" as="org.purl" /> <org.purl.description1 for="http://www.w3.org/foobar" > <*subject> <lcsh> <*for reference="immediate">Color and Color Palettes </for> </lcsh></subject> <*author>Ora Lassila</> <*title>Foos vs. Bars</> </description1> </for> <*by>Jim Miller</> <*suds>1.0</> <*density>0.5</> <*color hue="0" /> </v2:5>
As specificed, attribute order is not significant, but value order (for multiple values) is. Some syntactic approaches to multiple values may allow the same attribute to be specified multiple times (see, for example, the XML Col lections proposal [Hopmann 97]). In this case the order of the same attributes is significant.
To allow for conjunctive as well as disjunctive sets of multiple values, the sequence operator "list" may in the future be replaced by the operators "and" and "or". The actual ramifications of this to the model and possible implementations are at this point unclear.
Inheritance takes place in a hierarchy of lexically enclosed labels. Propagating inherited values is simple if one"sees" the entire hierarchy. From an individual label's standpoint, however, inheritance works using unidirectional links the label has no knowledge of. This is confusing: since a label does not know of all the links pointing to it, it can not alone determine the values it inherits (this is the reason why we do not allow inheritance over links expressed as URLs).
As currently defined in this document, the multiple schemata mechanism does not allow for the use of "mixin" schemata. For flexible means of extending metadata, a full multiple inheritance mechanism may be necessary.
Error tokens defined by PICS version 1.1 as well as the former version of the PICS-NG proposal are not included in this document. There are two ways to introduce error tokens and other similar constructs: errors could be represented by labels (referring to a special error schema, defined together with the other basic PICS schemata), or by allowing additional prefix operators (such as error) in addition to the ones defined by this document (i.e., label, list and range).
Since the PICS 1.1 metadata architecture is easily extensible, the old extension mechanism of PICS 1.1 is no longer needed. The multiple schemata approach can be used for "optional" extensions, a single new schemata should be used for "mandatory" ones.
Some people have expressed concerns about the fact that old PICS options are now mixed with the transmit names of ratings. Technically this is not a problem because we have a way of determining which attributes are which, but from a metadata author's standpoint this can be confusing. A possible solution is to put all options into a separate label and make that label a value of a new attribute (called, say, "label-attributes"). The options-schema can be defined in the same definition file as the basic content rating schema, and referred to using the fragment identifier syntax (say, "#options"). Inheritance of individual label options becomes difficult if they are placed in a separate label.
A minimal, canonical form of the syntax used has to be defined, for purposes of signing PICS-NG labels and for mechanically producing label representations.
[Berners-Lee 94] Berners-Lee, Tim et al, 1994. Uniform Resource Locators (URL). RFC 1738, CERN (et al). Available as http://ds.internic.net/rfc /rfc1738.txt.
[Hopmann 97] Alex Hopmann et al, 1997. Web Collections using XML. Proposal (submitted to W3C), Microsoft Corporation. Available as http://www.w 3.org/pub/WWW/Member/9703/XMLsubmit.html.
[Krauskopf 96] Krauskopf, Tim et al, 1996. PICS Label Distribution Label Syntax and Communication Protocols, Version 1.1. W3C Recommendation 31-October-96. Available as http://www.w3.o rg/pub/WWW/TR/REC-PICS-labels.html.
[Lassila 97] Lassila, Ora, 1997. PICS-NG Label Syntax Proposal. Unpublished working paper, W3C. Available as http://www.w3.org/pub/WWW/PICS/draft-lassila-pics-ng-label-syntax. html.
[Miller 96] Miller, Jim et al, 1996. Rating Services and Rating Systems (and Their Machine Readable Descriptions), Version 1.1. W3C Recommendation 31-October-96. Available as http:/ /www.w3.org/pub/WWW/TR/REC-PICS-services-961031.html.
In this section, we compare the above metadata object model to the model defined in "Web Collections using XML" [Hopmann 97]. The text below in the column titled "XML model" is quoted directly from section 2.2 "The Web Collection model." Commentary also includes information acquired in private discussions with Alex Hopmann.
XML model |
Commentary |
Web Collections provide a hierarchical structure for storing properties that describe objects. A collection is simply an association of field names to values. The meanings of these field names are defined by the profile is specified for the given collection. | In this respect the two models are identical. The word "profile" is used in the same meaning as our term "schema", and the word "property" is used in lieu of "attribute." |
A collection is not required to contain properties correlating to each field in its profile. Similarly, a collection may contain properties that do not correspond to any field in its profile. A collection may also contain more than one property that correlates to a single field in its profile. | Unknown attributes are permitted if an application is not concerned of the semantics of a label. Lists take the place of multiple occurrences of a property. |
The order of properties in a collection can be significant in specific applications but is not necessarily significant in all applications. Likewise, applications will determine the meaning of multivalued properties, missing properties, and properties that do not correspond to fields in the profile; applications may deem a collection invalid if does not contain appropriate information. However applications MUST be able to at a minimum gracefully ignore additional properties that they do not understand. | Schemata define all semantics. See the Open Issues section regarding ordering. |
A primary collection must explicitly refer to its profile. Secondary collections usually have implied profiles (such as the profile of the collection which encapsulates them), though they may explicitly refer to a profile. | The label model has a loosely similar inheritance mechanism. The XML Collection model does not specify inheritance very clearly. |
Web Collections support aggregate profiles. This is the ability to specify that a given collection has a properties from a first profile, and furthermore additional properties from other profiles. | See the Open Issues section regarding inheritance. |
This Web Collection specification draws a sharp line between the Web Collections syntax and the semantics implied by a particular application. A computer program must be able to parse and manipulate the Web Collection data without understanding the specific application. It need not however be able to do anything with the data unless it understands that specific profile. | Without knowledge of semantics, we do not believe any useful manipulation is feasible. |
Web Collections draw a distinction between two types of URIs. This distinction is based on the needs of a syntax parser. A URI can be used to point to some other resource (behaving like a link) in which case it is just normal data in the collection (a value), or a URI might be used to include some other resource within the collection (an inline reference). A Web Collection parser might use this information to determine whether to encapsulate additional resources with the Web Collection. | Inclusion by reference will be a barrier to adoption by firewall vendors. This feature should be excluded from the model. However, the label model allows a referent to be a label. The semantics of a schema determine whether the value of an attribute is interpreted as a reference. |
Revision History:
05-May-97 [swick] Add W3C logo and doc title at top
14-May-97 [swick] Add Sections 5.2 and 6.2 from Andrew Layman.
Correct
usage of "for-immediate" in the examples in Section 6.1
(formerly
just Section 6). This version published as "version 3.5" with
a new date code.