The Purpose of Groves
by Steven R. Newcomb


Subject:      Re: Purpose of groves
From:         "Steven R. Newcomb" <srn@techno.com>
Date:         1998/04/16
Newsgroups:   comp.text.sgml

-------------------------------------------

> So, what is the purpose of groves?

I can't resist this one.

The grove representation of an information resource is designed to
make the explicit and implicit components and properties of that
resource addressable.  The reasons for addressing such things defy my
attempts to enumerate them, but here's a list anyway:

* rule-based formatting and transformation applications,

* expression of relationships between components,

* expression of *traversible* relationships between components,

* specification of components to be re-used in new contexts,

* expression of new resources that impose alternative
  structures on existing resources (i.e., views), and

* association of metadata with components of read-only resources.

This universal addressability becomes possible because information
resources are represented in groves as sets of objects conforming to
sets of named classes, with each class having a set of named
properties (hence the name for the schemas of groves: "property
sets").  In effect, the property set that governs a grove gives
everything in the grove names, and makes everything countable, so, for
example, nodes that have the same class name are still addressable by
their ordinal and hierarchical positions in the grove.

SDQL (Standard Document Query Language) is a notation for expressing
addresses of things in groves; it does so in terms of the class names
and property names in the governing property set, and, of course,
in terms of the values of those properties.

-Steve

--

Steven R. Newcomb, President, TechnoTeacher, Inc.
Email:   srn@techno.com
WWW:     http://www.techno.com
FTP:     ftp://ftp.techno.com

voice:   +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax      +1 972 994 0087 (at ISOGEN: +1 214 953 3152)

3615 Tanner Lane
Richardson, Texas 75082-2618 USA

 -----------------------  next message follows ---------------

Subject:      Re: Purpose of groves
From:         "Steven R. Newcomb" <srn@techno.com>
Date:         1998/04/17
Newsgroups:   comp.text.sgml


> Is it possible to give an example of groves (and/or their use) that
> will be intuitively helpful for someone who can't figure out how to
> operationalize the abstract explanation?

> Is "groves" another name for something that might be familiar in a
> different guise/context?  Etc.?

I'm not sure how to be most helpful here, but I'll give it a whirl.
(I'd be interested to know if this explanation is helpful.)

First, let's leave SGML out of the picture.  Let's consider a
programming language, because it's a much simpler case.  (BASIC?  C?
Awk?  Doesn't matter.  Anything you're familiar with.  I'm going to
use awk as my example, because us document-processing types are all
pretty much guaranteed to have run into awk at one time or another.)
If we parse a program using a parser for the awk language, the output
of that parser is some sort of in-memory representation that can be
executed more or less directly, without further parsing or
interpretation.  That in-memory representation has, for any given awk
program, some way of representing, for every "if",

(1) the "if" conditional expression to be evaluated as
    true or false,

(2) the code to be executed if the expression
    evaluates as true, and

(3) the "else" code, if any, to be executed if the expression
    evaluates as false.

These above are three of the properties of every "if" in every awk
program.  The fact that we have written down the above three
properties somewhere does not constrain programs that interpret awk
programs to create in-memory classes that exactly correspond to the
above-numbered properties, but they could.  If, however, I wanted to
provide a standardized way of providing access to the in-memory
structures resulting from parsing awk programs, I might create a
property set for the awk programming language, so that those
properties have names that everyone knows and can ask for.

A property set declares the classes and properties of classes of the
information contained in a notation *after* it has been parsed.  The
syntax of the notation is dispensed with by the parsing process,
leaving only "nodes" conforming to certain classes of information.  (A
lot of people find the term "parse tree" more familiar than "grove".
That's OK with me; the term "grove" just adds certain ISO-defined
connotations and expectations to the notion of "parse tree".)  Each
class of information defined in a property set is defined to have
certain properties.  Each node in a particular parsed document (our
awk program is a document) conforms to a particular class, and it has
the properties defined in the property set for that class.  The node
*also* has *values* for those properties, because it is a real piece
of information, not just an abstract model of a class of things.  The
value of a property can be a node, a list of nodes, an indexed list of
nodes (a "named node list"), or just data.

If we were to create a property set for the awk programming language,
we might choose to declare that there is a class of nodes called
"if-thingy".  (What I just did, to give a name to a class of nodes,
is the very essence of what it is to define a property set.)
I can then go on to declare that Class "if-thingy" has three
properties:

(1) conditional-expression

(2) execute-if-true

(3) execute-if-false

(What I just did, to give names to the properties of a class,
is again the very essence of what it is to define a property set.)

If I build software for interpreting awk programs that builds groves
(sets of nodes) from awk programs, and that software conforms to my
property set, and the program contains the following code:

if ( x > 1) {
  milk the cow           # yeah, I know this isn't an awk expression
} else {
  eat the petunia        # and neither is this.  Ignore these comments.
}

  ... then there will be a node in the grove created from
the awk program, of class "if-thingy", three of whose properties
will have the following values:

* The "conditional-expression" property will have a value whose
  meaning will be "true if and only if the value of the variable whose
  name is x is greater than 1".

* The "execute-if-true" property will have a value whose meaning
  will, if executed, cause the computer to milk the cow.

* The "execute-if-false" property will have a value whose meaning
  will, if executed, cause the computer to eat the petunia.

************************************************************************

OK, now that you know exactly what a property set is, let's consider
the far more complex case of SGML or XML.

First of all, SGML is a notation like any other, and it has things in
it that are very fundamental.  One of the classes defined by the SGML
property set is "element", for obvious reasons.  In that sense, the
SGML property set is just the same as any other notation's property
set, although it's a pretty complex property set simply because of the
complexity of SGML's notation.

SGML is, among other things, a notation for marking up actual
information: element start-tags, element end-tags, and lots of other,
stranger features which are not especially relevant to this
discussion, but which are all reflected in the SGML property set, just
as anyone with good sense might expect.

SGML is funny stuff, though, because it *also* is a notation for
defining SGML-based notations: classes of SGML documents.  As you
know, SGML's DTD notation does this.  There is an unbounded set of
such notations.  Each such notation can be a set of classes with
properties.  In the straightforward case, each class can be an element
type whose semantic properties correspond exactly to its content and
attributes.  So, in such a straightforward case, anyway, there is no
essential difference between a DTD and a property set for the notation
defined by that DTD, except that SGML's DTD syntax was used to express
the classes and properties, instead of the usual syntax for property
sets.

Here's the kicker, though: the information *expressed* by an SGML
document may have node classes and properties that do *not* correspond
to any element types or attributes used to interchange the information
using SGML.  The HyTime DTD (or, if you prefer, the XLink
architecture) is an example of such a DTD.  For example, the fact that
a piece of data somewhere in the content of an element is the anchor
of a hyperlink that appears elsewhere in the document (or in some
other document!) is in some sense a property of that piece of data,
even though that piece of data has no attribute or anything that will
tell you that.++ So, in order to provide access to the *meaning* of
HyTime documents, in addition to the HyTime DTD, there is also a
HyTime Property Set that describes additional node types and
properties that are not found in the HyTime DTD, and that result from
the system's *realization of the meaning* of HyTime constructs.  Of
course, the HyTime DTD is just an example of a DTD for information
that has implicit meaning beyond what is made explicit in the DTD;
there is an unbounded set of other possible DTDs that have this same
characteristic.

The purpose of a DTD is primarily to define a structure for
interchange of a given class of information.  The purpose of a
property set is primarily to provide a living API (Application
Programming Interface), with standardized, knowable, formally-declared
names for all properties, to that same information set, after all
standard processing has already been done and it is truly "ready for
action" in the context of a real, operating application.

************************************************************************

++ Note: this is a great example of what I'm trying to explain, but it
  doesn't happen to be the way the HyTime property set actually works.
  The HyTime property set provides for the creation of "named node
  list" nodes (that don't appear in any HyTime document explicitly)
  when HyTime processing is done by a HyTime application or engine.
  These are nodes that provide indexing/lookup services to allow an
  application to determine whether any given piece of information is
  (or is part of) an anchor of a hyperlink, etc.  So the effect is the
  same as pasting a new property onto the data that is addressed as an
  anchor, and the fact that the grove contains nodes that weren't in
  the document is also the same, but the explanation I gave you above
  would lead you to think that HyTime works in a different way than it
  actually works (and I wouldn't want to mislead you about that!).

> What does one *do* with groves?

One does with groves whatever one does with information resources
after they have been parsed and their explicit and implicit semantics
have been made explicit in some sort of data structure structure that
looks like a grove.

I say, "looks like a grove" because it is not necessary that a grove
implementation slavishly conforms to its governing property set, or
even to the grove object model.  It is only necessary that the
implementation provide an interface to applications that allows
applications to address the things in the grove in terms of the names
of the classes and properties declared in the governing property set.

-Steve

--

Steven R. Newcomb, President, TechnoTeacher, Inc.
Email:   srn@techno.com
WWW:     http://www.techno.com
FTP:     ftp://ftp.techno.com

voice:   +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax      +1 972 994 0087 (at ISOGEN: +1 214 953 3152)

3615 Tanner Lane
Richardson, Texas 75082-2618 USA