[Archive copy mirrored from: http://www.textuality.com/mcf/MCF-tutorial.html, text only, June 22, 1997]
This document provides an introduction to the concepts behind the Meta Content Framework (MCF) and to the syntax used to store it.
The Meta Content Framework (MCF) is a tool to provide information about information. The primary goal is to make the Web (Internet or Intranet) more like a library and less like a messy heap of books on the floor. In order to understand MCF, there are three things that you'll need to learn:
MCF provides information about information by attaching properties to objects. Since this all lives in a computer, the objects are really just structures in computer memory, but they are normally used to represent things such as Web pages, companies, people, countries, and events. Properties are used to give information about these objects. For example, a Web page could have one property that gives its size, another that gives its URL, and another that identifies the person who maintains it.
To keep things clear, we distinguish between properties and property types. An example of a property type is sizeInBytes, which could apply to any Web page. When it is applied to one particular object, it becomes a property, an instance of the property type, which has a value: for example the Web page at http://www.textuality.com/ currently has a sizeInBytes property whose value is 5,676.
So far, this is simple. But there are a few things that make MCF special. First of all, property types are also objects; this means that they can have properties. For example, the property type that gives the size of a Web page might be named sizeOfPage, it might have a property that says it applies only to Web pages, another that says its value is a number, another that says that number is measured in bytes, and another that provides some explanatory text that explains this property type - for example, why its value might not be the same from one day to the next. To keep things simple, we always give property types names that begin with a lower-case letter.
An obvious question is: where do properties come from? MCF comes with a few helpful built-in property types, but new ones can be invented freely. One of the important built-in property types is typeOf, which gives types to objects. An object can have more than one typeOf; for example, an object representing a person could be typeOf Doctor and typeOf Golfer. The final key concept is that of the Category - in the previous example, "Doctor" and "Golfer" are categories. To be precise, Doctor (and anything else that is the value of a typeOf property) is an MCF object whose typeOf is Category. To keep things simple, we always give categories names that begin with an upper-case letter.
Objects, properties, and categories are not hard to understand. We need to have a syntax to store them so that humans can create them and computers can exchange and use them. The syntax is based on XML, the Extensible Markup Language. In XML, documents contain elements, which have types and are either empty or are delimited by start-tags and end-tags, and have attributes which have names and values, for example:
<p secret="false">This sentence is in the content of an |
The XML representation of MCF uses an element to represent an object; properties of the object are represented by other elements contained inside it. The type of the element is the Category of the object. If the object is in more than one category, any of them can be used for the element type. It turns out that all objects are members of a Category called "Unit", so one way or another, you can always find an element type. Inside the element are other elements representing the properties of the the object; for these, the name of the property is the element type. Here is an example of a Web page and its size:
<WebPage> |
Storing facts about a Web page is not going to help at all unless we know how to get at it, so let's add a URL property:
<WebPage> |
Now let's identify the author of the page:
<WebPage> |
This looks quite a bit different. Unlike the size, which is a number, and the URL, which is just a character string, the author is a person, so we're going to need a new object to represent it. In the WebPage property, we've given it a label "tim" which we'll use as a unique identifier on the Person, given in an attribute whose name is ID. Also, since a Person object does no good unless we can find the person, we'd better include an email address.
<Person id="tim"> |
We call the string "tim" the unique identifier of the object. Unique identifiers aren't required; note that in the example above with the reference to "tim", the WebPage doesn't have a unique identifier. This means that no other object could have a property pointing at this piece of MCF (i.e. this particular WebPage object).
Note that when a property has a simple value like a number or string, we put that in the content of the element; when the property's value is another object, we put a pointer to it in an attribute value and leave the element describing the property empty.
It turns out that this page is its author's home page, so let's add another property to express this fact. The value of this property could just be a URL, or it could be a pointer to the WebPage MCF object. Either would be perfectly legal, but the second would probably be more useful. Since we now need to point back to the WebPage object, we'll have to give it a unique identifier:
<WebPage id="tbray-home"> |
The unique identifiers in the example ("tim" and "tbray-home") are probably not very well-chosen - if we were creating a large number of nodes to describe a complicated web site, it would be a lot of work to keep all the names unique. There are a couple of ways to deal with this problem. First of all, you could use the WebPage's URL and the Person's email address as the unique identifiers, which would be a great deal safer, if perhaps verbose:
<WebPage id="www.textuality.com"> |
Or if the XML was being generated by a computer program, it could just use a unique number for each; it wouldn't be that readable, but the only reason these identifiers exist is to allow us to hook properties to objects:
<WebPage id="90135-91741"> |
Now suppose some software is reading this MCF block, and it doesn't really know what we mean when we assign the author and homePage properties. We may not be able to explain the meaning of these properties in MCF, but we can give some very useful facts about them:
<PropertyType id="author"> |
Domain and range are special properties built-in to MCF; they tell any computer program that knows MCF that the author property maps Web pages to people, and that homePage maps people to Web pages. Similarly, we see that size maps Web pages to bytes, and email maps people to strings.
This last highlights a problem; there are not that many builtin types, and they are mostly concerned with internal MCF housekeeping: things such as domain, range, typeOf, and Unit. The usefulness of MCF would be increased if everyone used some common vocabulary; an excellent candidate would be email as in the example above. For this reason, the initial MCF proposal proposes a medium-sized common vocabulary, for things that it seems that everyone will want to use.
MCF is built on a mathematical model. You probably don't need to understand the model in order to read or to create MCF data, but you might if you're going to be inventing your own properties, and you certainly will if you want to write a computer program to process MCF. The MCF formalism is fully defined in the document Meta Content Framework Using XML; the following is a summary.
Mathematically, MCF is a directed linked graph (DLG). Objects are represented by nodes; properties are represented by arcs, which are arrows connecting two nodes; arcs have labels, which are the property types. The figure below is a picture of the DLG representing the MCF data given above in XML.
This should make the difference between a property and a property type clear; domain is a property type that is used heavily; each occurrence of it as a label on an arrow is a property.
This model is of great use in manipulating MCF data in a computer program; the arrows are just pointers of some sort, and the objects are just structures in memory or objects in a database table. XML documents, in general, can be quite tricky to represent in computer storage; the existence of the DLG model makes a lot of this difficulty go away.
MCF comes with a set of predefined property types and categories that are used to build the rest of the system.
This is used to link objects to their categories:
<Person id="Bill Clinton"> |
MCF comes with some simple types built in. It knows about all the "primitive" data types provided in the Java programming language, using the names of the Java "class wrappers" for those types. Examples are Integer, Float, Boolean, and Char. Along with these, MCF knows has a built-in data type named Date that stores a time-stamp including a date and a time of day.
Any object can have one of these primitive types (for example, the object which is the value of a pageSizeInBytes property is just an Integer. Alternatively, a node can be a real object that can have its own properties; if so, it is called a Unit. Unit is a category, and thus any object that is not a primitive data type is typeOf Unit.
Most useful property types are quite limited in what they can apply to and what values they can take. For example, sizeOfPage applies only to objects which have typeOf WebPage and has values which are simply numbers. The domain and range properties respectively govern the typeOf nodes that property types can apply to, and the Categories of their values. The range of both domain and range is Category.
This expresses a relationship between two Categories, saying that one is more general form of the same thing:
<Category id="WebPage"> |
All property types, for example sizeInBytes are, not surprisingly, typeOf PropertyType. Some properties are generalizations of others; these are called superProperties:
<PropertyType id="parent"> |
As shown above, an object can have any number of typeOf properties. In some cases, though, you want only one property of a given type to apply to a an object:
<FunctionalPropertyType id="department-Number"> |
Whereas an object can have many Categories (via typeOf properties), some Categories are incompatible with each other:
<Category id="Public-Company"> |
Not every object has to have a name; objects which need not have names include web pages and those with primitive data types such as the number 37. However, we require every Property and Category to have a name, and we require the name to be a legal XML name, so that we can use them for element types when the MCF is stored. When we say that "domain" is a property type, we really mean:
<PropertyType> |
The description is designed to hold human-readable descriptive text:
<Category id="frobnosticator"> |
Whereas the XML techniques we've described up to this point could be used to represent all of MCF can do, there are some common cases that would be tedious or verbose. There are a few XML shortcuts built in to allow efficient representation of these special situations.
Parent is a very general Property type commonly observed between objects. It is so common that MCF allows a special shorthand in its XML expression:
<Chapter id="c1"> |
is the same as
<Chapter id="c1"> |
In some cases, several properties of a single type may need to be sequenced. This is done using the special Category Sequence and the associated property type ord (for ordinal). The Sequence appears as the child of the property element describing the type of property to be sequenced, and the ord elements appear as children of the Sequence. The order in which the ord properties appear is significant (this is not the case with other property elements) and reflects the desired sequence.
<Mime-Type id="text/xml"> |
It is often the case that a single property applies to a great number of objects. We can represent this very compactly in XML with the help of inheritance through a Category. For example, all the pages in a particular web site are likely to have the same copyright and author organization, and be described by the same site table of contents. If we assign all of these objects to a category, we can say that all members of that category inherit these properties. Suppose, for example, that we wish to do this for all the Web pages at the Textuality site (for this example, two will be enough):
<WebPage id="w0001"> |
Note that the value of the property can be given either in the unit attribute or in the element content, just as with ordinary properties.
An address would be a common enough property type. The whole address could be considered a property, but the street number, city, country, and so on could also be useful properties. This situation, where properties include each other, can be expressed straightforwardly just by inclusion:
<Company id="Textuality"> |
The above is exactly the as (but much less efficient than):
<Company id="Textuality"> |
The normal packaging of MCF in XML groups a bunch of properties with the object they apply to. On occasion, it may be convenient to group some properties with the object that is their value; this can be done with the INVERSE attribute:
<PropertyType id="parent"> |
This document has benefited greatly from Lauren Wood's input.