Steven R. Newcomb on the Grove Paradigm (1999-09-08)
Date: Wed, 8 Sep 1999 18:41:08 -0500 From: "Steven R. Newcomb" <srn@techno.com> To: xml-dev@ic.ac.uk Subject: Re: ANN: XML and Databases article [Ron Bourret:] > I've read Paul's tutorial and the GroveMinder summary on the Web, so > let's see if I've got this straight. A grove is basically a > property set, broken down into classes, each of which has > properties. There are probably relationships between those > classes. For example, a grove for XML could have classes for > elements, attributes, entities, and so on, where the element class > points to the attribute class. A grove for a relational database > would have classes for tables, columns, etc., where the table class > points to the column class. Pretty close, but not quite on the money. First of all, a terminological problem: A grove is the set of objects that results from understanding (parsing and processing) some particular logical resource. No grove is made from more than one logical resource (I say "logical" resource because some single resources are distributed in multiple physical containers). However, more than one grove can be made from a single resource. This is because resources have multiple layers. For example, in the case of XML documents, there is always the XML syntax layer of "understanding". The property set (schema) for this layer is probably strongly reminiscent of the DOM. However, there are one or more vocabularies used in every XML document (there's always at least one because the element types have names, even if there's no DTD). The semantics of these vocabularies may imply "emergent properties" of the information contained in the resource, and there can be a property set for each vocabulary's emergent properties. So preparing a single resource for application-internal exploitation may involve creating groves for each vocabulary. By giving names to the emergent properties of vocabularies, such property sets can be, in effect, APIs to the semantics of each vocabulary, thus opening the way for vocabulary-specific software engines, and for far more reliable cross-application information interchange than the Web has ever seen. So, instead of saying, > A grove for XML could have classes for elements, attributes, > entities, and so on, where the element class points to the attribute > class. ... you might better have said any one of the following (this has to be said with extreme precision, so look closely): | A property set for the XML language could have classes for elements, | attributes, entity references, and so on, where the element class | has, as one of its nodal properties an "attribute specification | list" property, whose value is a list of "attribute value | specification" nodes. or: | The primary grove form of an XML resource could have nodes | conforming to the "element", "attribute specification", and "entity" | classes, and so on, where the "element" class has, as one of its | properties, an "attribute specification list" property, whose value | consists of a list of nodes that all must be of the class "attribute | value specification". or, in view of the fact that the DTD of an XML resource is part of its grove (when it appears or is referenced by the DOCTYPE declaration in an XML resource): | The primary grove from of an XML resource could have element type | definitions, attribute list definitions, entity declarations, and so | on, where the element type definition class has, as one of its nodal | properties, an "attribute definition list" property, whose value | consists of a list of nodes that all must be of the class "attribute | definition". The second problem with your summary statement is that "points to" is actually an implementation detail. The standard only says that nodes (objects) in groves have properties, and the some properties can be "nodal" -- that is, the values of such properties can be other nodes (in the same grove and/or in other groves). The manner in which a node is represented to be a property value in any given implementation is almost certainly going to be via pointing (at least in a von Neumann architecture machine), but it's important to realize that that is an implementation decision, and it's inaccurate to say that "pointing" has anything to do with the grove paradigm. A property set can only say that the value of a property is nodal, and implementations of the grove paradigm must make it appear that the value of such a property is indeed one or more nodes, but how that is made to happen is not part of the standard (nor should it be). So, instead of saying: > "where the table class points to the column class" ... it would be much more accurate to say: | where the "table" class has a property named "columns" whose value | is a list of "column"-class nodes. > In this sense, the XML information set has much in common with > groves, as it is a property set. Yes, except that it's not yet clear that the XML info set will be expressed using the ISO Property Set DTD -- but this is merely a syntax issue. I agree with David Megginson: I expect it to be readily convertible. > Similarly, the DOM could be viewed as an API for a grove. Yes, to a single kind of grove, specifically an XML syntactic grove. (A grove governed by the properties of XML's syntax.) (Aside: I hope we're not facing a future in which the semantics of certain chosen vocabularies will be directly supported by future versions of the DOM. Such support should "plug into" (and be unpluggable from) the DOM. No vocabulary-specific support should become a required feature of all DOM implementations. For example, making XLink a vocabulary is fine; making the DOM able to support XLink but no other linking vocabularies would be the start of a long nightmare with a bad ending. To do that would significantly reduce the freedom of industries to design their own information architectures, and to evolve them according to their own perceived needs. It would also destroy the DOM, which must stay simple in order to survive. No API can do everything for everybody, and once you start putting support for DTD-specific (or namespace-specific) semantics into the DOM, where do you stop? I've watched a couple of systems bloat uncontrollably and meet their demise in similar ways, and the stage is perfectly set for the same thing to happen to the DOM.) > The XML information set is not a grove because ... it is not > ... expressed in grove notation. If you replace the word "grove" with "property set" (twice) in the above sentence, you are exactly correct. (There is no such thing as "grove notation". "Grove" is an abstract concept that, when sensibly implemented, makes a grove exactly as human readable as a hex dump of RAM in which there are C structs in no particular order.) > The DOM is not an API for a grove because it's a bit wishy-washy in > places -- for example, four characters of PCDATA could be one node > or four, so it's not built on a rigid enough data model.) Close enough. I would put the same thought differently: The DOM doesn't have a formalized underlying data model, so the DOM doesn't answer the need for a solid basis on which to express the addresses of the components of XML resources. I'm hoping and believing that after the XML infoset is done we'll have a basis for implementing a powerful version of XPath (or XPointers or whatever the idea of generalized addressing of components of XML resources is being called at that time). > The nice thing about groves is that all groves, regardless of what > they are built on, have certain commonalities, such as > addressability, so you can perform certain common functions with > them. Right. All nodes in groves have the same "object model" (I'm using this term in a more formal, scientific sense than the term is used in the phrase "Document Object Model (DOM)".) The grove object model is: Groves have nodes, nodes conform to classes, and classes have named properties with value constraints. Nodes have named properties, and values for those properties. That's about it; the rest is detail. (It's pretty interesting detail.) > GroveMinder is generic grove middleware. It has plug-ins, called > Minders (I think of them as drivers), Hooray, thank you! I have sometimes called them "notation drivers" only to get the blankest stares imaginable. (I then have asked something lame, like, "Do you know what a device driver is, and why we have them?") But you obviously get the point of Minders: Minders represent plug and play support for individual notations, in a system that makes all content look alike (i.e., conform to the grove object model). > that can build groves over different property sets. For example, > there is one Minder for SGML/XML documents and a different Minder > for relational databases. Well, actually, there's probably a one-to-one correspondence between property sets and database schemas. In order to address information in terms of its structure, you have to know the structure. In grove-land, the structure is defined by a property set. Different databases have different structures, normally expressed as database schemas. Making a database look like a grove is very straightforward. The bulk of the work is translating the schema into a property set (which is, after all, a kind of schema). There's a bit of coding involved, too, but the GroveMinder developer kit has tools that make this amazingly easy. (At least the Lockheed-Martin people were amazed, and they said so publicly at XML '98.) The grove paradigm breaks down the distinction between documents (resources) and databases. Everything, in its addressable form, is a grove, and a grove is a database. But a grove is convertible into an interchangeable resource (that is, if the property set is a comprehensive expression of the syntactic features of the notation of an interchangeable resource). Obviously, a resource is also convertible into a grove, given a property set for its notation. Property sets are the bridge between the world of information interchange, and the world in which interchanged information is immediately useful (i.e., the world in which information exists after parsing and common semantic processing of interchangeable resources has been done). If the resource is *already* a database, there's probably no parsing or processing involved. All that needs to be done is to put a translating layer over it that makes the database look like a grove. Then, the database and all its contents are fully able to participate in the wider world of interchangeable information resources: they can be linked, re-used by reference, have any kind of metadata associated with them, etc. etc. > (There can actually be different property sets for a "type" of > data. For example, one property set for XML might include entities > and another might not, specifying that each entity is replaced by > its value. A different Minder is needed for each property set.) Strictly speaking, you're correct: people can disagree about the properties of, say, PostScript as a notation, or they might agree about the properties but not about what the names of the properties should be. Nothing prevents people from writing their own property sets. In fact, however, the situation is not as chaotic as your example might lead one to believe, because of "grove plans". A "grove plan" is a way of selectively deleting properties from classes, and of deleting classes altogether, as a way of avoiding the overhead of storing and/or processing those properties and classes. For example, the property set for SGML is comprehensive, but an application may not need, for example, to store nonsignificant white spaces found in the start tags of SGML elements. The application may therefore use a "grove plan" to delete the properties whose values would be those white space characters. The addresses of nodes in groves are always expressed with respect to a property set and a grove plan. If it were not so, you wouldn't know whether to count a certain node type or not, when counting nodes to get to a particular node. And it's true that, for example, some people want to count the text that was inserted via an entity reference as a distinct node, while other people don't; this kind of flexibility is needed in order to keep peace in the family, and allow people to do addressing in the way they want to do it. Property sets are modularizable, so that it's relatively easy to express commonplace grove plans, to establish conformance levels for processing systems, and to understand the rules for interpreting address expressions. A Minder that implements a property set comprehensively can optionally view groves less comprehensively, so as to be able to resolve addresses that were expressed according to lesser grove plans. There doesn't have to be a different Minder for each different grove plan. (And that's where your example might be misleading.) > One thing GroveMinder can do is store a grove in its own > database. (Note that this is separate from the database addressed by > the relational database Minder -- it has a structure designed to > store groves.) Thus, GroveMinder can store an XML document in a > database as a grove and is what I, in my article, called a content > management systems. That is, it can store and retrieve an XML > document as a document. Sounds right to me. ("...its own database" sounds a bit odd because GroveMinder can use any ODBMS for grove storage.) > Some questions: > 1) Is it possible to combine groves of different types? For example, > can I take a grove representing a table in a relational database and > stuff it into a grove for an XML document, for example as the > content of an element? I'm afraid I don't grasp the intent of this question. When such an XML document is exported from its grove as an XML document, what should the document look like? There's no need (and no way) to stuff something into something else. It is only necessary that the "content" property of the element have, as its value, the node in the database grove that represents the table. The ISO standard SGML Property Set does not allow this; only certain classes of nodes within the same grove are allowed as the value of the "content" property of "element" nodes. However, if you want to change your operative SGML Property Set so that this will be permitted, nothing (other than good sense) prevents you from doing it; the grove paradigm will readily support you in your madness. I don't know why it would be sensible to regard an RDBMS table as the content of an SGML or XML element. The normal meaning of "content" is elements, character data, and/or other SGML constructs, right there, inside the element. There is no way to write a general purpose grove-to-SGML converter unless the classes of the nodes that can appear in element content are limited and known. (We certainly don't want to dump arbitrary data into the content of an element; this would invite a situation in which the document that is ultimately exported is unparsable.) > If so, does the table grove retain its table-ness, or is it > converted to one or more XML elements? Both cases seem reasonable, > although the latter would presumably require a special converter. If > the latter case is true, then GroveMinder might also fit what I call > data transfer middleware, depending on how the conversion is done. I would suggest that an efficient way to handle this would be to convert the table into node classes that *are* permitted to appear in element content, and then make *those* nodes the value of the content property. If you do it this way, you're necessarily making the decisions that must be made about how the XML document, when exported, will reflect the table data. You're right that one application of GroveMinder is data transfer middleware. The conversion program is comparatively easy to write, since everything already conforms to the same object model. > 2) Are groves themselves relevant at a high level in a discussion of > XML and databases? It strikes me that, like SAX and the DOM, they > are a useful tool in implementing software that stores/retrieves XML > documents (or data from those documents) in a database but are not > directly relevant to the discussion itself. Instead, they are most > relevant to the user in that they are likely to weigh heavily in the > feature set exposed by a content management system or (possibly) > data transfer system. Good question. I guess that's for the person who's doing the discussing to decide. Since groves can be persistent (e.g., stored in databases), and since XML resources can become groves, it seems to me that groves are relevant. You're right, the real reason they're interesting is their impact on feature sets. But aren't feature sets (and especially tradeoffs between feature sets) what technical discussions are all about? > 3) This isn't directly related to XML/databases, but what other > common functionality do all groves have? For example, can I write an > application that navigates groves, regardless of their source (I > think the answer is yes)? Yes. We have a demonstration of that. > Can I combine groves of different types or convert painlessly -- > that is, without writing any additional code -- from one type to > another (I think the answer is no -- additional code is needed)? Probably no, but it really depends on what you mean by "code." You have to decide how instances of nodes of particular classes and in particular contexts will be mapped onto instances of nodes of particular classes in the new context, and you have to express your decisions in a formal, machine processable fashion. Right now, using GroveMinder, you can do that with a Python script, which seems about as quick, intuitive, and flexible a way to do it as any. I don't know of any transformation specification language with which a similar feat (transforming one kind of grove into another kind of grove) can be done, except possibly DSSSL (which relies on (and was written in terms of) the grove paradigm, by the way). We haven't implemented DSSSL, but it shouldn't be too hard to do that on top of GroveMinder. Would you call a DSSSL transformation specification "code"? (I guess I would.) > Can I hyperlink from one grove to another (I think the answer is > yes)? Yes. The interesting thing here is that traversal can be initiated from any node in any grove, on account of a link in any grove, and traversal can be made to any node in any grove. Neither the traversal initiation point, nor the traversal target, has to be a linking construct. Neither has to "know" anything about the fact that they are actually anchors. > And so on. I'll provide you with a copy of the GroveMinder demo, if you like. There are lots of playful possibilities. Some people have even written their own HyTime documents to use with the demo software. It's a challenge for puzzle lovers, because the demo does not report errors in documents. -Steve -- Steven R. Newcomb, President, TechnoTeacher, Inc. srn@techno.com http://www.techno.com ftp.techno.com voice: +1 972 231 4098 fax +1 972 994 0087 pager (150 characters max): srn-page@techno.com 3615 Tanner Lane Richardson, Texas 75082-2618 USA xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 ======================================================================= From srn@techno.com Wed Sep 8 21:13:34 1999 Date: Wed, 8 Sep 1999 15:20:34 -0500 From: "Steven R. Newcomb" <srn@techno.com> To: david@megginson.com Cc: xml-dev@ic.ac.uk Subject: Re: ANN: XML and Databases article [David Megginson:] > There aren't many people alive who actually know Groves (we couldn't > all fit in a Cessna, but we probably could squeeze into a Dash-8 > with a few empty seats), so it had no real familiarity advantage. <rant type=grove-paradigm-promoting> Groves are going to turn out to be like Linux, which began with a very few people who had a vision that turned out to work. As was the case with Linux in those early days, there is nobody doing big media advertising about it, and even the trade press, whose income is derived from such advertising, hasn't heard of groves very much. That will change. Linux has risen on the strength of the idea that people can and should be in direct control of their operating system, and that the result of such control will be increased human productivity. Similarly, groves will rise on the strength of the idea that people should be in direct control of their information. The product-differentiation barriers that vendors have set up around their customers' data must come down. There is no information that civilization can afford to leave out of the mainstream of information processing. XML is a step toward this goal, but it requires that the data be converted into XML; it will never happen that all data will be stored (or even interchanged) as XML. The grove paradigm brings the barriers down without necessitating data conversion. The grove paradigm lets the markup be elsewhere than inside the data. Even though groves are the technical foundation of the SGML, DSSSL, and HyTime international standards (respectively the proud, heavier-duty forerunners of XML, XSL, and XLink, among other W3C Recommendations), there is no money for groves precisely because system vendors have *less than no reason* to popularize this dangerous idea. As with Linux, however, that is the very reason why the grove paradigm will become commonplace: it will wring massive inefficiencies out of the software systems marketplace, and out of software systems. As everybody who attended Metastructures in Montreal last month knows, people who are into solving tough real-world information management problems, like DataChannel / ISOGEN, are selling and developing the grove paradigm as a core strategy, because they know that there is nothing else out there that compares to the power it brings to solving tough business problems, both technically and politically. Other system vendors cannot ignore this situation forever. It won't be too long before groves are a mass-market phenomenon (even if they're not called "groves" by then). The opportunities are almost unbelievably large. </rant> But don't worry, David: if you don't provide a property set for XML as part of the work of the XML infoset group, we'll take what you do produce and turn it into a property set. That way, it'll be machine processable as just another notation, by engine software that is just another plug-in to the wider world that includes all other notations, and all other database schemas. That will be a good thing, and we can't let a little matter of syntax stand in the way of progress. -Steve -- Steven R. Newcomb, President, TechnoTeacher, Inc. srn@techno.com http://www.techno.com ftp.techno.com voice: +1 972 231 4098 fax +1 972 994 0087 pager (150 characters max): srn-page@techno.com 3615 Tanner Lane Richardson, Texas 75082-2618 USA xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 ====================================================================== From martind@netfolder.com Wed Sep 8 21:58:43 1999 Date: Wed, 8 Sep 1999 22:10:19 -0400 From: Didier PH Martin <martind@netfolder.com> To: xml-dev@ic.ac.uk Subject: RE: ANN: XML and Databases article Hi Steven, Steven said: -------------------------------------------------- Probably no, but it really depends on what you mean by "code." You have to decide how instances of nodes of particular classes and in particular contexts will be mapped onto instances of nodes of particular classes in the new context, and you have to express your decisions in a formal, machine processable fashion. Right now, using GroveMinder, you can do that with a Python script, which seems about as quick, intuitive, and flexible a way to do it as any. I don't know of any transformation specification language with which a similar feat (transforming one kind of grove into another kind of grove) can be done, except possibly DSSSL (which relies on (and was written in terms of) the grove paradigm, by the way). We haven't implemented DSSSL, but it shouldn't be too hard to do that on top of GroveMinder. Would you call a DSSSL transformation specification "code"? (I guess I would.) Didier says: -------------------------------------------------- You're absolutely right Steven, yes DSSSL could be made inter-operable with grove engines quite easily. In fact, we are working on an interface for grove engines in the OpenJade project. Actually, OpenJade includes a SGML property set based grove and this grove "in memory" only (i.e. resident on the heap). This limitation could be removed by allowing other grove engines to be processed by DSSSL. I would also call DSSSL a transformation specification code either from a grove to a modified grove of from a grove into a flow object tree. How can we bridge the vision to the reality simply by sitting around the table and define the API between gove engines and transformation engines. The DOM only reflects a particular interface to a particular property set (If I can express myself that way). Obviously a grove is more than that (anyway you know that). So, why not work on a grove API, publish it, and then submit it to our collegues like those present in this list. Call to action: -------------- If anyone is interested by the task to define an API between grove engines and transformation engines (XSL or DSSSL for instance), please send me an email and we'll set a discussion group with the OpenJade team and you so that we all together define the Linux of markup technologies ;-). Then we will submit the document to other collegues for further discussion and feedback. regards Didier PH Martin mailto:martind@netfolder.com http://www.netfolder.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
Prepared by Robin Cover for the The SGML/XML Web Page archive.