SGML: Groves as trees, graphs

David Durand asserted that a grove" is just a new name for what everyone else in the world calls "parse tree", and others (Eliot Kimber, James Clark, Joe English) sought to nuance that characterization. . .
From w3c-sgml-wg-request@www10.w3.org Sat Dec 28 23:23:14 1996
Date: Sun, 29 Dec 1996 12:16:52 +0700
To: dgd@cs.bu.edu (David G. Durand)
From: James Clark <jjc@jclark.com>
Subject: Re: clink/ilink direction (Was: anchor awareness)
Cc: w3c-sgml-wg@www10.w3.org

At 16:02 28/12/96 -0500, David G. Durand wrote:
>At 4:11 PM 12/27/96, W. Eliot Kimber wrote:
>>At 05:04 PM 12/27/96 -0500, David G. Durand wrote:
>>>    A "grove" is just a new name for what everyone else in the world calls
>>>a "parse tree".
>>
>>Not exactly correct: groves are a new name for what everyone else in the
>>world calls a direct graph data structure.  It happens that one obvious
>>application of directed graphs is the representation of parse trees, but
>>that's not the only thing groves are used for.  Groves are useful because
>>the provide the following features on top of the basic directed graph of
>>nodes concept:
>
>trees are directed graphs. Are you saying that groves can include cyles of
>nodes (ie node that are their own ancestors), or multiple inheritance
>(directed _acyclic graphs_)? If not, groves are part of that subset of
>directed graphs known as trees, and you are spreading more smoke into the
>air, instead of shedding light on the issues.

Groves are directed graphs (not acyclic), but they can also be considered as
trees.  More precisely:

- A subset of the edges are distinguished as representing
"origin-to-subnode" relationships.  If you delete all edges other than
these, then the graph becomes a single tree.

- A subset of the "origin-to-subnode" edges are distinguished as
representing parent-child relationships.  If you delete all edges other than
these, then the graph becomes a set of disjoint trees: one of these trees is
the normal ESIS-like element structure tree; each attribute assignment (name
plus value) is also its own tree.

James

              ----------------------------------------------

From w3c-sgml-wg-request@www10.w3.org Sun Dec 29 21:25:35 1996
To: w3c-sgml-wg@www10.w3.org
Subject: Re: clink/ilink direction (Was: anchor awareness) 
Date: Sun, 29 Dec 1996 19:13:39 -0800
From: Joe English <jenglish@crl.com>

dgd@cs.bu.edu (David G. Durand) wrote:
> W. Eliot Kimber:
> >David G. Durand:
> >>    A "grove" is just a new name for what everyone else in the world calls
> >>a "parse tree".
> >
> >Not exactly correct: groves are a new name for what everyone else in the
> >world calls a direct graph data structure. [...]
>
> trees are directed graphs. Are you saying that groves can include cyles of
> nodes (ie node that are their own ancestors), or multiple inheritance
> (directed _acyclic graphs_)?

Sort of.

Groves are like trees in that every node with the exception
of a unique root node has a single parent.  However, nodes
in a grove also have a collection of named properties,
whose values can be either "plain data" or a list of
references to other nodes.  Nodal properties (those whose
value is a list of nodes) are classified as either "subnode
properties" or "reference node properties"; every node
except for the root is a member of a subnode property of
exactly one other node in the grove (i.e., its parent), but
there are no restrictions on reference node properties.  If
you consider only the subnode properties, you have a tree;
if you consider the reference node properties as well, you
have a directed (possibly cyclic) graph.

Further, non-leaf nodes have a single distinguished subnode
property designated as the "content property"; if you
consider only the content properties you get the
"principal tree" of the grove.  For example in the
SGML grove plan, attribute nodes are subnodes of element
nodes, but they are not part of the principal tree.

> >3. Provides a standardized abstraction mechanism divorced from the
> >   implementation issues.
> 
> Ah, kind of like a parse tree. In fact, exactly like.

A grove is nothing like a parse tree in the sense
of the term with which I am familiar (a representation
of a derivation of a string of terminals with respect to 
a grammar); they're more like an abstract syntax tree 
(i.e., a representation of the information derived from 
the parse tree) augmented with cross-references between nodes.

> My inclination agaist groves stems from:
>    1. The complexity of groves (reflecting the complexity of full SGML parsin
> g).

The grove data structure taken by itself is really quite
simple and elegant.  The SGML property set is horrendously
complex, but if you only consider the "important" parts
(e.g., ESIS plus a few extra properties that got left out)
groves are very useful.  (Cost 2 uses a grove-like data
structure for its internal representation, so I can attest
to their utility.)


--Joe English

  jenglish@crl.com