Steven R. Newcomb on the Grove Paradigm (1999-09-08)

Date:      Wed, 8 Sep 1999 18:41:08 -0500
From:      "Steven R. Newcomb" <srn@techno.com>
To:        xml-dev@ic.ac.uk
Subject:   Re: ANN: XML and Databases article

[Ron Bourret:]

> I've read Paul's tutorial and the GroveMinder summary on the Web, so
> let's see if I've got this straight.  A grove is basically a
> property set, broken down into classes, each of which has
> properties. There are probably relationships between those
> classes. For example, a grove for XML could have classes for
> elements, attributes, entities, and so on, where the element class
> points to the attribute class. A grove for a relational database
> would have classes for tables, columns, etc., where the table class
> points to the column class.

Pretty close, but not quite on the money.  First of all, a
terminological problem: A grove is the set of objects that results
from understanding (parsing and processing) some particular logical
resource.  No grove is made from more than one logical resource (I say
"logical" resource because some single resources are distributed in
multiple physical containers).  However, more than one grove can be
made from a single resource.  This is because resources have multiple
layers.  For example, in the case of XML documents, there is always
the XML syntax layer of "understanding".  The property set (schema)
for this layer is probably strongly reminiscent of the DOM.  However,
there are one or more vocabularies used in every XML document (there's
always at least one because the element types have names, even if
there's no DTD).  The semantics of these vocabularies may imply
"emergent properties" of the information contained in the resource,
and there can be a property set for each vocabulary's emergent
properties.  So preparing a single resource for application-internal
exploitation may involve creating groves for each vocabulary.  By
giving names to the emergent properties of vocabularies, such property
sets can be, in effect, APIs to the semantics of each vocabulary, thus
opening the way for vocabulary-specific software engines, and for far
more reliable cross-application information interchange than the Web
has ever seen.

So, instead of saying, 

> A grove for XML could have classes for elements, attributes,
> entities, and so on, where the element class points to the attribute
> class.

 ... you might better have said any one of the following (this has to
 be said with extreme precision, so look closely):

| A property set for the XML language could have classes for elements,
| attributes, entity references, and so on, where the element class
| has, as one of its nodal properties an "attribute specification
| list" property, whose value is a list of "attribute value
| specification" nodes.

 or:

| The primary grove form of an XML resource could have nodes
| conforming to the "element", "attribute specification", and "entity"
| classes, and so on, where the "element" class has, as one of its
| properties, an "attribute specification list" property, whose value
| consists of a list of nodes that all must be of the class "attribute
| value specification".

 or, in view of the fact that the DTD of an XML resource is part of
 its grove (when it appears or is referenced by the DOCTYPE
 declaration in an XML resource):

| The primary grove from of an XML resource could have element type
| definitions, attribute list definitions, entity declarations, and so
| on, where the element type definition class has, as one of its nodal
| properties, an "attribute definition list" property, whose value
| consists of a list of nodes that all must be of the class "attribute
| definition".


The second problem with your summary statement is that "points to" is
actually an implementation detail.  The standard only says that nodes
(objects) in groves have properties, and the some properties can be
"nodal" -- that is, the values of such properties can be other nodes
(in the same grove and/or in other groves).  The manner in which a
node is represented to be a property value in any given implementation
is almost certainly going to be via pointing (at least in a von
Neumann architecture machine), but it's important to realize that that
is an implementation decision, and it's inaccurate to say that
"pointing" has anything to do with the grove paradigm.  A property set
can only say that the value of a property is nodal, and
implementations of the grove paradigm must make it appear that the
value of such a property is indeed one or more nodes, but how that is
made to happen is not part of the standard (nor should it be).

So, instead of saying:

> "where the table class points to the column class"

 ... it would be much more accurate to say:

| where the "table" class has a property named "columns" whose value
| is a list of "column"-class nodes.

> In this sense, the XML information set has much in common with
> groves, as it is a property set.

Yes, except that it's not yet clear that the XML info set will be
expressed using the ISO Property Set DTD -- but this is merely a
syntax issue.  I agree with David Megginson: I expect it to be readily
convertible.

> Similarly, the DOM could be viewed as an API for a grove.

Yes, to a single kind of grove, specifically an XML syntactic grove.
(A grove governed by the properties of XML's syntax.)

(Aside: I hope we're not facing a future in which the semantics of
certain chosen vocabularies will be directly supported by future
versions of the DOM.  Such support should "plug into" (and be
unpluggable from) the DOM.  No vocabulary-specific support should
become a required feature of all DOM implementations.  For example,
making XLink a vocabulary is fine; making the DOM able to support
XLink but no other linking vocabularies would be the start of a long
nightmare with a bad ending.  To do that would significantly reduce
the freedom of industries to design their own information
architectures, and to evolve them according to their own perceived
needs.  It would also destroy the DOM, which must stay simple in order
to survive.  No API can do everything for everybody, and once you
start putting support for DTD-specific (or namespace-specific)
semantics into the DOM, where do you stop?  I've watched a couple of
systems bloat uncontrollably and meet their demise in similar ways,
and the stage is perfectly set for the same thing to happen to the
DOM.)

> The XML information set is not a grove because ... it is not
> ... expressed in grove notation.

If you replace the word "grove" with "property set" (twice) in the
above sentence, you are exactly correct.  (There is no such thing as
"grove notation".  "Grove" is an abstract concept that, when sensibly
implemented, makes a grove exactly as human readable as a hex dump of
RAM in which there are C structs in no particular order.)

> The DOM is not an API for a grove because it's a bit wishy-washy in
> places -- for example, four characters of PCDATA could be one node
> or four, so it's not built on a rigid enough data model.)

Close enough. I would put the same thought differently: The DOM
doesn't have a formalized underlying data model, so the DOM doesn't
answer the need for a solid basis on which to express the addresses of
the components of XML resources.  I'm hoping and believing that after
the XML infoset is done we'll have a basis for implementing a powerful
version of XPath (or XPointers or whatever the idea of generalized
addressing of components of XML resources is being called at that
time).

> The nice thing about groves is that all groves, regardless of what
> they are built on, have certain commonalities, such as
> addressability, so you can perform certain common functions with
> them.

Right.  All nodes in groves have the same "object model" (I'm using
this term in a more formal, scientific sense than the term is used in
the phrase "Document Object Model (DOM)".)  The grove object model is:
Groves have nodes, nodes conform to classes, and classes have named
properties with value constraints.  Nodes have named properties, and
values for those properties.  That's about it; the rest is detail.
(It's pretty interesting detail.)

> GroveMinder is generic grove middleware. It has plug-ins, called
> Minders (I think of them as drivers),

Hooray, thank you!  I have sometimes called them "notation drivers"
only to get the blankest stares imaginable.  (I then have asked
something lame, like, "Do you know what a device driver is, and why we
have them?")  But you obviously get the point of Minders: Minders
represent plug and play support for individual notations, in a system
that makes all content look alike (i.e., conform to the grove object
model).

> that can build groves over different property sets. For example,
> there is one Minder for SGML/XML documents and a different Minder
> for relational databases.

Well, actually, there's probably a one-to-one correspondence between
property sets and database schemas.  In order to address information
in terms of its structure, you have to know the structure.  In
grove-land, the structure is defined by a property set.  Different
databases have different structures, normally expressed as database
schemas.  Making a database look like a grove is very straightforward.
The bulk of the work is translating the schema into a property set
(which is, after all, a kind of schema).  There's a bit of coding
involved, too, but the GroveMinder developer kit has tools that make
this amazingly easy.  (At least the Lockheed-Martin people were
amazed, and they said so publicly at XML '98.)

The grove paradigm breaks down the distinction between documents
(resources) and databases.  Everything, in its addressable form, is a
grove, and a grove is a database.  But a grove is convertible into an
interchangeable resource (that is, if the property set is a
comprehensive expression of the syntactic features of the notation of
an interchangeable resource).  Obviously, a resource is also
convertible into a grove, given a property set for its notation.
Property sets are the bridge between the world of information
interchange, and the world in which interchanged information is
immediately useful (i.e., the world in which information exists after
parsing and common semantic processing of interchangeable resources
has been done).  If the resource is *already* a database, there's
probably no parsing or processing involved.  All that needs to be done
is to put a translating layer over it that makes the database look
like a grove.  Then, the database and all its contents are fully able
to participate in the wider world of interchangeable information
resources: they can be linked, re-used by reference, have any kind of
metadata associated with them, etc. etc.

> (There can actually be different property sets for a "type" of
> data. For example, one property set for XML might include entities
> and another might not, specifying that each entity is replaced by
> its value. A different Minder is needed for each property set.)

Strictly speaking, you're correct: people can disagree about the
properties of, say, PostScript as a notation, or they might agree
about the properties but not about what the names of the properties
should be.  Nothing prevents people from writing their own property
sets.  In fact, however, the situation is not as chaotic as your
example might lead one to believe, because of "grove plans".  A "grove
plan" is a way of selectively deleting properties from classes, and of
deleting classes altogether, as a way of avoiding the overhead of
storing and/or processing those properties and classes.  For example,
the property set for SGML is comprehensive, but an application may not
need, for example, to store nonsignificant white spaces found in the
start tags of SGML elements.  The application may therefore use a
"grove plan" to delete the properties whose values would be those
white space characters.

The addresses of nodes in groves are always expressed with respect to
a property set and a grove plan.  If it were not so, you wouldn't know
whether to count a certain node type or not, when counting nodes to
get to a particular node.  And it's true that, for example, some
people want to count the text that was inserted via an entity
reference as a distinct node, while other people don't; this kind of
flexibility is needed in order to keep peace in the family, and allow
people to do addressing in the way they want to do it.

Property sets are modularizable, so that it's relatively easy to
express commonplace grove plans, to establish conformance levels for
processing systems, and to understand the rules for interpreting
address expressions.

A Minder that implements a property set comprehensively can optionally
view groves less comprehensively, so as to be able to resolve
addresses that were expressed according to lesser grove plans.  There
doesn't have to be a different Minder for each different grove plan.
(And that's where your example might be misleading.)

> One thing GroveMinder can do is store a grove in its own
> database. (Note that this is separate from the database addressed by
> the relational database Minder -- it has a structure designed to
> store groves.) Thus, GroveMinder can store an XML document in a
> database as a grove and is what I, in my article, called a content
> management systems. That is, it can store and retrieve an XML
> document as a document.

Sounds right to me.  ("...its own database" sounds a bit odd because
GroveMinder can use any ODBMS for grove storage.)

> Some questions:

> 1) Is it possible to combine groves of different types? For example,
> can I take a grove representing a table in a relational database and
> stuff it into a grove for an XML document, for example as the
> content of an element?

I'm afraid I don't grasp the intent of this question.  When such an
XML document is exported from its grove as an XML document, what
should the document look like?

There's no need (and no way) to stuff something into something else.
It is only necessary that the "content" property of the element have,
as its value, the node in the database grove that represents the
table.  The ISO standard SGML Property Set does not allow this; only
certain classes of nodes within the same grove are allowed as the
value of the "content" property of "element" nodes.  However, if you
want to change your operative SGML Property Set so that this will be
permitted, nothing (other than good sense) prevents you from doing it;
the grove paradigm will readily support you in your madness.

I don't know why it would be sensible to regard an RDBMS table as the
content of an SGML or XML element.  The normal meaning of "content" is
elements, character data, and/or other SGML constructs, right there,
inside the element.  There is no way to write a general purpose
grove-to-SGML converter unless the classes of the nodes that can
appear in element content are limited and known.  (We certainly don't
want to dump arbitrary data into the content of an element; this would
invite a situation in which the document that is ultimately exported
is unparsable.)

>  If so, does the table grove retain its table-ness, or is it
> converted to one or more XML elements?  Both cases seem reasonable,
> although the latter would presumably require a special converter. If
> the latter case is true, then GroveMinder might also fit what I call
> data transfer middleware, depending on how the conversion is done.

I would suggest that an efficient way to handle this would be to
convert the table into node classes that *are* permitted to appear in
element content, and then make *those* nodes the value of the content
property.  If you do it this way, you're necessarily making the
decisions that must be made about how the XML document, when exported,
will reflect the table data.

You're right that one application of GroveMinder is data transfer
middleware.  The conversion program is comparatively easy to write,
since everything already conforms to the same object model.

> 2) Are groves themselves relevant at a high level in a discussion of
> XML and databases? It strikes me that, like SAX and the DOM, they
> are a useful tool in implementing software that stores/retrieves XML
> documents (or data from those documents) in a database but are not
> directly relevant to the discussion itself. Instead, they are most
> relevant to the user in that they are likely to weigh heavily in the
> feature set exposed by a content management system or (possibly)
> data transfer system.

Good question.  I guess that's for the person who's doing the
discussing to decide.  Since groves can be persistent (e.g., stored in
databases), and since XML resources can become groves, it seems to me
that groves are relevant.  You're right, the real reason they're
interesting is their impact on feature sets.  But aren't feature sets
(and especially tradeoffs between feature sets) what technical
discussions are all about?

> 3) This isn't directly related to XML/databases, but what other
> common functionality do all groves have? For example, can I write an
> application that navigates groves, regardless of their source (I
> think the answer is yes)?

Yes.  We have a demonstration of that.

> Can I combine groves of different types or convert painlessly --
> that is, without writing any additional code -- from one type to
> another (I think the answer is no -- additional code is needed)?

Probably no, but it really depends on what you mean by "code."  You
have to decide how instances of nodes of particular classes and in
particular contexts will be mapped onto instances of nodes of
particular classes in the new context, and you have to express your
decisions in a formal, machine processable fashion.  Right now, using
GroveMinder, you can do that with a Python script, which seems about
as quick, intuitive, and flexible a way to do it as any.  I don't know
of any transformation specification language with which a similar feat
(transforming one kind of grove into another kind of grove) can be
done, except possibly DSSSL (which relies on (and was written in terms
of) the grove paradigm, by the way).  We haven't implemented DSSSL,
but it shouldn't be too hard to do that on top of GroveMinder.  Would
you call a DSSSL transformation specification "code"?  (I guess I
would.)

> Can I hyperlink from one grove to another (I think the answer is
> yes)?

Yes.  The interesting thing here is that traversal can be initiated
from any node in any grove, on account of a link in any grove, and
traversal can be made to any node in any grove.  Neither the traversal
initiation point, nor the traversal target, has to be a linking
construct.  Neither has to "know" anything about the fact that they
are actually anchors.

> And so on.

I'll provide you with a copy of the GroveMinder demo, if you like.
There are lots of playful possibilities.  Some people have even
written their own HyTime documents to use with the demo software.
It's a challenge for puzzle lovers, because the demo does not report
errors in documents.

-Steve

--
Steven R. Newcomb, President, TechnoTeacher, Inc.
srn@techno.com  http://www.techno.com  ftp.techno.com

voice: +1 972 231 4098
fax    +1 972 994 0087
pager (150 characters max): srn-page@techno.com

3615 Tanner Lane
Richardson, Texas 75082-2618 USA

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1

=======================================================================

From srn@techno.com Wed Sep  8 21:13:34 1999
Date: Wed, 8 Sep 1999 15:20:34 -0500
From: "Steven R. Newcomb" <srn@techno.com>
To: david@megginson.com
Cc: xml-dev@ic.ac.uk
Subject: Re: ANN: XML and Databases article

[David Megginson:]

> There aren't many people alive who actually know Groves (we couldn't
> all fit in a Cessna, but we probably could squeeze into a Dash-8
> with a few empty seats), so it had no real familiarity advantage.

<rant type=grove-paradigm-promoting>

Groves are going to turn out to be like Linux, which began with a very
few people who had a vision that turned out to work.  As was the case
with Linux in those early days, there is nobody doing big media
advertising about it, and even the trade press, whose income is
derived from such advertising, hasn't heard of groves very much.  That
will change.  Linux has risen on the strength of the idea that people
can and should be in direct control of their operating system, and
that the result of such control will be increased human productivity.
Similarly, groves will rise on the strength of the idea that people
should be in direct control of their information.  The
product-differentiation barriers that vendors have set up around their
customers' data must come down.  There is no information that
civilization can afford to leave out of the mainstream of information
processing.  XML is a step toward this goal, but it requires that the
data be converted into XML; it will never happen that all data will be
stored (or even interchanged) as XML.  The grove paradigm brings the
barriers down without necessitating data conversion.  The grove
paradigm lets the markup be elsewhere than inside the data.

Even though groves are the technical foundation of the SGML, DSSSL,
and HyTime international standards (respectively the proud,
heavier-duty forerunners of XML, XSL, and XLink, among other W3C
Recommendations), there is no money for groves precisely because
system vendors have *less than no reason* to popularize this dangerous
idea.  As with Linux, however, that is the very reason why the grove
paradigm will become commonplace: it will wring massive inefficiencies
out of the software systems marketplace, and out of software systems.

As everybody who attended Metastructures in Montreal last month knows,
people who are into solving tough real-world information management
problems, like DataChannel / ISOGEN, are selling and developing the
grove paradigm as a core strategy, because they know that there is
nothing else out there that compares to the power it brings to solving
tough business problems, both technically and politically.  Other
system vendors cannot ignore this situation forever.  It won't be too
long before groves are a mass-market phenomenon (even if they're not
called "groves" by then).  The opportunities are almost unbelievably
large.

</rant>

But don't worry, David: if you don't provide a property set for XML as
part of the work of the XML infoset group, we'll take what you do
produce and turn it into a property set.  That way, it'll be machine
processable as just another notation, by engine software that is just
another plug-in to the wider world that includes all other notations,
and all other database schemas.  That will be a good thing, and we
can't let a little matter of syntax stand in the way of progress.

-Steve

--
Steven R. Newcomb, President, TechnoTeacher, Inc.
srn@techno.com  http://www.techno.com  ftp.techno.com

voice: +1 972 231 4098
fax    +1 972 994 0087
pager (150 characters max): srn-page@techno.com

3615 Tanner Lane
Richardson, Texas 75082-2618 USA

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1

======================================================================

From martind@netfolder.com Wed Sep  8 21:58:43 1999
Date: Wed, 8 Sep 1999 22:10:19 -0400
From: Didier PH Martin <martind@netfolder.com>
To: xml-dev@ic.ac.uk
Subject: RE: ANN: XML and Databases article

Hi Steven,

Steven said:
--------------------------------------------------
Probably no, but it really depends on what you mean by "code."  You
have to decide how instances of nodes of particular classes and in
particular contexts will be mapped onto instances of nodes of
particular classes in the new context, and you have to express your
decisions in a formal, machine processable fashion.  Right now, using
GroveMinder, you can do that with a Python script, which seems about
as quick, intuitive, and flexible a way to do it as any.  I don't know
of any transformation specification language with which a similar feat
(transforming one kind of grove into another kind of grove) can be
done, except possibly DSSSL (which relies on (and was written in terms
of) the grove paradigm, by the way).  We haven't implemented DSSSL,
but it shouldn't be too hard to do that on top of GroveMinder.  Would
you call a DSSSL transformation specification "code"?  (I guess I
would.)

Didier says:
--------------------------------------------------
You're absolutely right Steven, yes DSSSL could be made inter-operable with
grove engines quite easily. In fact, we are working on an interface for
grove engines in the OpenJade project. Actually, OpenJade includes a SGML
property set based grove and this grove "in memory" only (i.e. resident on
the heap). This limitation could be removed by allowing other grove engines
to be processed by DSSSL.

I would also call DSSSL a transformation specification code either from a
grove to a modified grove of from a grove into a flow object tree.

How can we bridge the vision to the reality simply by sitting around the
table and define the API between gove engines and transformation engines.
The DOM only reflects a particular interface to a particular property set
(If I can express myself that way). Obviously a grove is more than that
(anyway you know that). So, why not work on a grove API, publish it, and
then submit it to our collegues like those present in this list.

Call to action:
--------------
If anyone is interested by the task to define an API between grove engines
and transformation engines (XSL or DSSSL for instance), please send me an
email and we'll set a discussion group with the OpenJade team and you so
that we all together define the Linux of markup technologies ;-). Then we
will submit the document to other collegues for further discussion and
feedback.

regards
Didier PH Martin
mailto:martind@netfolder.com
http://www.netfolder.com


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
Prepared by Robin Cover for the The SGML/XML Web Page archive.