RDF/Topic Maps and reification
Date: Thu, 27 Sep 2001 17:56:33 -0500 From: "Steven R. Newcomb" <firstname.lastname@example.org> Cc: email@example.com Subject: RDF/Topic Maps: late/lazy reification vs. early/preemptive reification
For me, at least, the shortest, most compelling and cogent demonstration of a certain critical difference between Topic Maps and RDF was Michael Sperberg-McQueen's wrap-up keynote at the Extreme Markup Languages Conference (www.extrememarkup.com) last August.
N.B.: This note is about *what I learned* from Michael's presentation, and it does not necessarily reflect Michael's views, or even constitute an accurate account of Michael's presentation. It's merely what I remember about it. (I love Michael's wrap-ups at the Extreme conferences. It's a good thing he traditionally speaks last, because he's a hard act to follow.)
Michael brought colored ribbons and other paraphernalia to the podium, in order to illustrate his words.
"Tom buttered the bread," was the statement Michael wanted to represent. There being no volunteers in the audience named "Tom", Michael appointed our Conference Chair, Tommie Usdin, to represent the "Tom" node. Syd Bauman, as I recall, was appointed to represent the "bread" node. A blue ribbon between Tommie and Syd represented the arc representing the statement that Tom buttered the bread.
[Use a monospace font, such as courier, if you want to see the ASCII art as it was intended to be seen.]
Tommie -----------> Syd ("Tom") (blue ("the bread") ribbon)
So far, so good. "Now," Michael said, "What if I want to say that Tom buttered the bread with a knife? In order to attach the knife to this statement, I need a node for the knife, and I also I need a node to represent the buttering itself. (There must be some sort of a 'buttering event' going on here.)" After everyone had finished laughing over our internal visualizations of "a buttering event", Kate Hamilton was appointed to be the node that represented the "buttering event". A differently colored ribbon was used to connect Tommie to Kate, and Kate to Syd. Now there was a triangle of ribbons, because Tommie was still *also* buttering Syd by virtue the original blue ribbon.
Kate ("the Buttering") /\ / \ / \ / \ / \ / \ / \ Tommie -----------> Syd ("Tom") (blue ("the bread") ribbon)
Now, with Kate in existence, it was possible to use yet another ribbon color to connect the knife to Kate (the "buttering event").
knife --------- Kate ("the Buttering") /\ / \ / \ / \ / \ / \ / \ Tommie -----------> Syd ("Tom") (blue ("the bread") ribbon)
So now Kate was holding one end of each of three ribbons: one to Tom, one to the bread, and one to the knife. Michael then proposed to further modify the statement: "Tom buttered the bread with a knife *on Friday*". Yet another volunteer became "Friday", and yet another ribbon was given to Kate, the other end of which was "Friday".
knife --------- Kate ("the Buttering") / /\ Friday ------+ / \ / \ / \ / \ / \ / \ Tommie -----------> Syd ("Tom") (blue ("the bread") ribbon)
It was clear that Michael could have gone on to attach any number of things to Kate; the "buttering event" had a limitless capacity to be related to other things. Indeed, by the end of Michael's wrap-up keynote, Kate was already holding one end of several ribbons, including the two ribbons needed to connect Tommie (Tom) to Syd (the bread).
It was also clear that, once "the buttering event" existed as a distinct node, it was no trouble at all to say anything about that event. However, *before* Kate was appointed to be that node, there was no way to say anything about the buttering event.
After the "buttering event" node represented by Kate was brought into existence, the combination of itself with its arcs to Tommie (Tom) and to Syd (the bread) was sufficient to represent the fact that "Tom buttered the bread". Therefore, once the "buttering event" existed, there was no further need for the original blue ribbon connecting Tommie (Tom) and Syd (the bread). The blue ribbon was redundant, and it unnecessarily complicated the graph of ribbons and nodes. The blue ribbon should go away, right?
knife --------- Kate ("the Buttering") / /\ Friday ------+ / \ / \ / \ / \ / \ / \ Tommie Syd ("Tom") ("the bread")
In Topic Maps, there is no way to say "Tom buttered the bread" without creating an explicit "buttering event" -- a "buttering association" between Tom and the bread. Instead of making a direct connection between Tom and the bread, Topic Maps forces us to create a "buttering event" node, and to connect "Tom" and "the bread" to that node. The advantage here is that we can always say something new about anything that already exists, because even the "verbs" in Topic Maps (such as "to butter") are necessarily already "noun-ified" (such as "the buttering") and are ready to be addressed as the ends of additional arcs. This has significant advantages: it simplifies the process of amalgamating facts and opinions when you can't know in advance which things anyone will want to express a new fact or opinion about. If someone wants to say something about "Tom"'s buttering of "the bread", there is guaranteed to be something to which those remarks can be attached.
In RDF, we are not forced to create a "buttering event" node in order to say "Tom buttered the bread". We can simply connect "Tom" to "the bread" directly. This has significant advantages if it can be accurately assumed that nobody will need to say something about the buttering:
There are many fewer nodes and arcs to worry about.
Perhaps more significantly, verbs remain verbs. Many people, especially computer jockeys who have not been steeped in the traditions of markup languages, application-independent information interchange and self-describing documents, are more comfortable with verbs (processes) than with nouns. This is not a bad thing. It is only the simple truth that, if you're focusing on implementing the application of butter to bread, it would only be distracting and annoying to try to provide for unanticipatable commentaries and constraints on specific "butterings".
RDF provides a process, called "reification", whereby an arc can be alternatively represented as a node when it is discovered that someone wants to say something about it. ("Reification" literally means "thing-ification" or "noun-ification" -- transformation into a thing. The term "reification" is derived from the Latin noun "res" (pronounced like "race"), which means "thing".) When Michael used Kate Hamilton (the "buttering event") to be the surrogate of the arc represented by the blue ribbon, he was reifying the blue ribbon. The arc became a node (and two new arcs).
In RDF, reification involves changing the graph that results from processing interchangeable RDF statements. In Topic Maps, however, everything is already reified. No existing arcs need be changed when new information comes along. New arcs and nodes are added, and these additions are the only changes that are required. This comparative changelessness can be extremely important. If you find something in a graph, and you make a record of the arcs you traversed in order to find it, you may want to be able to use that same set of arcs to find the same thing at some future date. If some of those arcs disappear, you may not be able to retrace your steps. If, on the other hand, the process of reification does *not* cause the arcs whose functions have been duplicated to disappear, then we have a situation in which a considerable amount of redundant information is contributing to our infoglut problem. Either way, a policy of "late reification" (or maybe we should call it "lazy reification") causes problems for the usefulness of continuously-amalgamated knowledge.
Does this mean that I'm pro-Topic Maps and anti-RDF? No, not at all! These two paradigms have great need for each other.
RDF needs Topic Maps in order to make scalable management of knowledge emanating from disparate sources simple, practical and predictable. Enlightened self-interest dictates that the RDF camp consider Topic Maps as an important and basic RDF application,
Topic Maps needs RDF in order to have a popular, widely-accepted basis upon which to describe exactly what a topic map means, in a fashion that will be immediately processable by a significant number of existing and well-funded tools. The PMTM4 model is an example of a model of the meaning of Topic Maps that can easily be translated into RDF -- once and for all topic maps.
If the PMTM4 model is adopted for this purpose, the corresponding RDF arcs will never need to be reified, even the very first time someone needs to make an assertion about a "buttering".
In the past, I myself have considered RDF as the competitor of Topic Maps. Happily, I was wrong -- at least in fundamental technical terms. Indeed, I now believe that if there were no RDF, the Topic Maps camp would have to invent something like it in order to make the Maps paradigm predictably comprehensible by the programmers who are pioneering the development of the Internet.
There are other interesting comparisons to be made between RDF and Topic Maps, but ever since Michael's demonstration of the difference between early vs. late (preemptive vs. lazy) reification, I have been meaning to document both the difference and the demonstration. Thanks for reading it.
Steven R. Newcomb, Consultant firstname.lastname@example.org voice: +1 972 359 8160 fax: +1 972 359 0270 1527 Northaven Drive Allen, Texas 75002-1648 USA