Property Sets - Steven R. Newcomb as Advocate and (Techno)Teacher

Date:     Sat, 3 Oct 1998 17:01:08 -0500
From:     "Steven R. Newcomb" <[email protected]>
To:       [email protected]
CC:       [email protected]
Subject:  Re: XML data model

> One of the things I suspect we need to be able to map XML documents
> into application components is a data model of some kind.

Wouldn't it be nice if it were expressible/expressed as a property set? That's how SGML's data model is expressed. Also HyTime's. It's very likely that XML's data model can be expressed as a true subset ("grove plan") of the SGML Property Set. However, this would probably not be the friendliest possible way to express XML's property set. I suspect that the friendliest possible XML property set would use XML (e.g., DOM) terminology wherever possible.

Work done by Fujitsu Labs has shown how XLink is expressible as a property set. Unsurprisingly, it turns out to be about the same as the relevant portions of the HyTime property set, except that all the names have been changed to correspond to XLink terminology.

Property sets have some pretty attractive characteristics, and the "grove object model" which they serve as schemas was originally devised to describe the formal characteristics of SGML syntax. It's very neutral, very standard, very pure, and as simple as it can be. Moreover, property sets are expressible as XML documents; the DTD for property set documents already exists.

Property sets describe classes of nodes, and the properties of each class of node, as such nodes are output by a parser for a given notation. A property set does not describe any methods, so it can form an excellent all-purpose foundation for methods and applications. Since every property of every syntactic construct is assigned a name in a property set, the names of properties readily form a natural basis for query languages, too.

Having a property set for XML would set the stage for XML to become the language of documents that integrate information expressed in all other notations, because they can pretty much all have property sets, too.

-Steve

Steven R. Newcomb, President, TechnoTeacher, Inc.
[email protected]  http://www.techno.com  ftp.techno.com
voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax    +1 972 994 0087 (at ISOGEN: +1 214 953 3152)
3615 Tanner Lane
Richardson, Texas 75082-2618 USA

xml-dev: A list for W3C XML Developers. To post, mailto:[email protected] Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/

From       [email protected] Wed Feb 24 19:59:34 1999
Date:      Wed, 24 Feb 1999 16:50:42 -0600
From:      "Steven R. Newcomb" <[email protected]>
To:        [email protected]
Cc:        [email protected]
Subject:   Re: Streaming XML

[Jonathan Borden:]

> ... this is basic stuff, but the point is to
> emphasize that the distinction between what an
> object 'does' and what an object 'is' is not so
> clearcut.

Actually, property sets make it very clearcut. Remember that property sets are not implementation descriptions, whereas UML models are.

In property sets there are never any methods whatsoever. This point is emphasized by the fact that, in the grove paradigm, the information components are called "nodes" rather than "objects". If you choose to instantiate a grove as a collection of objects (as many reasonable people, including those at my own company, certainly would), that's OK, but the fundamental abstraction does not have the concept of methods.

[Much good stuff from Jonathan Borden omitted, with all points taken.]

> In fact, when you get out of the SGML/XML world,
> the use of the terms 'property set' and 'grove'
> get replaced by terms 'UML', 'persistence' and
> 'object model'. What you promise that use of
> property sets and grove plans will automate
> processing of data and interoperability, CASE
> tools vendors promise using UML. What is the
> essence of the difference between an information
> set and/or property set and/or grove plan versus
> UML?

I was hoping you would ask this question!

Let me begin by oversimplifying: the difference is that you can do much more with UML, and that oversufficiency is precisely UML's deficiency in this problem-space.

It is very difficult for people who have made their careers in *information processing* to perceive the virtue of making a complete distinction between processing and information. Even so, it's of paramount importance to make this distinction, if any of the following statements are true:

the information may outlast existing processing systems,
the information may have unforeseen uses in an ever-changing world, and
the information must be interchanged in an open, multivendor environment.

Instead of encapsulating such information in methods, as objects often do, we need to encapsulate it in semantics, as XML can be used to do. Having rendered the information as XML, and having chosen appropriate semantic-bearing tags and other attributes for its various components, we now have the information in a totally useless but highly interchangeable form that can become input to any application for any purpose, including unforeseen purposes.

For me, this useless but interchangeable XML form of the information is the form that is most deserving of its owner's respect. It is the owner's best choice of representation as the "maintained source code" of the information asset. It's the form that nobody but the information owner owns or controls. It's the form that no software vendor has a lock on. It's the form that (presumably) has everything needed to reconstitute a useful, application-ready form of the same information asset, regardless of the nature of that application, foreseen or unforeseen.

Now let's consider how well-described this XML asset really is. After all, if the asset doesn't have a very accurate description, we can't be sure that unforeseen applications will find the information intelligible.

With DTDs, we have a way to model the structural relationships of the elements to each other. But that's not enough to guarantee that the information will be understood in the manner that its architects and creators intended. With various proposed XML schema languages, we can impose lexical typing requirements and certain additional syntactic/structural requirements, but, again, that doesn't guarantee that the information will be understood in the manner that was intended. Neither the DTD nor the schema extensions so far proposed can tell us the information set that is supposed to be derivable from the XML form of the information asset. The information is still not described well enough to allow unforeseen applications, developed by unforeseeable developers, to use the information or to create new but similar information. All of the generic structural/syntactic validation in the world will not guarantee that!

This is because the interchangeable form of the information is not the same as the useful form, which we will assume, for purposes of this discussion, is objects that conform to certain classes and have certain constellations of properties and relationships. Now the question becomes, "What defines the data, interrelationships, and semantics of those objects?" The ISO/SGML answer is, "A property set, designed as part of the interchange architecture, that defines the classes of objects that will reflect the quintessential information set conveyed by the resource."

The object classes defined by a property set, and the node-objects in the groves that conform to those classes, are strictly the canonical, static *result* of the processing that is explicitly (but only conceptually) *required* to be done to all resources that conform to the architecture, before they are used by an application. Conceptually speaking, these "groves" fully respect the characteristics of the interchangeable resource that they represent, including the fact that an interchangeable resource has no methods, and there is nothing dynamic (or even useful) about it when it's in its XML form.

A property set is an abstract model of the useful information that can be extracted from an interchangeable resource. There is nothing in a grove that isn't already in the corresponding resource. Property sets are designed to exactly reflect the characteristics of information that can be extracted from information resources.

An intelligent person like yourself may remark, "Well, then, I guess the abstract properties of C++ notation must be very complex, because they can describe arbitrarily complex processes." You're right, they are, and the abstract properties of C++ notation can be modeled using the property set paradigm. (And modeling C++ notation would be an interesting exercise, although I'm not yet confident of commercial interest.) A property set for C++ notation might include node classes with such names as "variable name", "passed argument", "operator", "method", "object", "class definition", etc.

So why bother with property sets, when UML is more powerful?

Because property sets impose the design discipline of focusing on what is being interchanged, rather than on what might be done by particular applications. They force you to focus on the precise nature of the "maintained source code" of the information. They force you to think more abstractly, which can be uncomfortable but is often very worthwhile. They force you to recognize that interchangeable information cannot modify itself, and has no built-in methods.
Because property sets are designed to support the addressing of arbitrary components of information, and their nature imposes the discipline of designing for various forms of addressing. Everything that is modeled in a property set can become a node in a grove, and everything that can become a node in a grove is predictably and reproducibly addressable. This means that addresses created and recorded by one application will be understandable and correctly resolvable by other applications. This is the key to the solution of the general hyperlinking problem. If, for example, we're addressing some node by counting other nodes, all of the counted nodes must exist, at least conceptually.

> Don't get me wrong, I think the work on
> information sets, property sets and groves is
> terrific and needs to be continued. One way to
> do this is to turn our heads sideways ever so
> often to see what collegues in the distributed
> object world are doing. These problems are
> universal.

Very true.

But information interchange is a funny thing. XML does not proceed from the study of computer programming. It comes from another direction, and it's a different problem space. Portable-software-ology is a specialized subdomain of, and not the same thing as, portable-information-ology.

(I sure wouldn't want to try to support portable information without portable software, though!)

At the risk of confusing the reader, let me add that the property set syntax is just one syntax for doing what property sets do, albeit the ISO standard one for doing it. The claim has been made by Eliot Kimber that the STEP schema language, EXPRESS, would do as well or better. I think he's probably right. EXPRESS, however, is a more powerful language that is more demanding to learn. By contrast, the property set syntax is defined as an SGML or XML DTD, and a small and simple one at that.

-Steve

Steven R. Newcomb, President, TechnoTeacher, Inc.
[email protected]  http://www.techno.com  ftp.techno.com
voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax    +1 972 994 0087 (at ISOGEN: +1 214 953 3152)
3615 Tanner Lane
Richardson, Texas 75082-2618 USA

xml-dev: A list for W3C XML Developers. To post, mailto:[email protected]
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1

Prepared by Robin Cover for the The SGML/XML Web Page archive. For other references on property sets, see "Groves, Grove Plans, and Property Sets in SGML/DSSSL/HyTime."

SEARCH Advanced Search ABOUT Site Map CP RSS Channel Contact Us Sponsoring CP About Our Sponsors NEWS Cover Stories Articles & Papers Press Releases CORE STANDARDS XML SGML Schemas XSL/XSLT/XPath XLink XML Query CSS SVG TECHNOLOGY REPORTS XML Applications General Apps Government Apps Academic Apps EVENTS LIBRARY Introductions FAQs Bibliography Technology and Society Semantics Tech Topics Software Related Standards Historic	Property Sets - Steven R. Newcomb as Advocate and (Techno)Teacher Date: Sat, 3 Oct 1998 17:01:08 -0500 From: "Steven R. Newcomb" <[email protected]> To: [email protected] CC: [email protected] Subject: Re: XML data model > One of the things I suspect we need to be able to map XML documents > into application components is a data model of some kind. Wouldn't it be nice if it were expressible/expressed as a property set? That's how SGML's data model is expressed. Also HyTime's. It's very likely that XML's data model can be expressed as a true subset ("grove plan") of the SGML Property Set. However, this would probably not be the friendliest possible way to express XML's property set. I suspect that the friendliest possible XML property set would use XML (e.g., DOM) terminology wherever possible. Work done by Fujitsu Labs has shown how XLink is expressible as a property set. Unsurprisingly, it turns out to be about the same as the relevant portions of the HyTime property set, except that all the names have been changed to correspond to XLink terminology. Property sets have some pretty attractive characteristics, and the "grove object model" which they serve as schemas was originally devised to describe the formal characteristics of SGML syntax. It's very neutral, very standard, very pure, and as simple as it can be. Moreover, property sets are expressible as XML documents; the DTD for property set documents already exists. Property sets describe classes of nodes, and the properties of each class of node, as such nodes are output by a parser for a given notation. A property set does not describe any methods, so it can form an excellent all-purpose foundation for methods and applications. Since every property of every syntactic construct is assigned a name in a property set, the names of properties readily form a natural basis for query languages, too. Having a property set for XML would set the stage for XML to become the language of documents that integrate information expressed in all other notations, because they can pretty much all have property sets, too. -Steve Steven R. Newcomb, President, TechnoTeacher, Inc. [email protected] http://www.techno.com ftp.techno.com voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137) fax +1 972 994 0087 (at ISOGEN: +1 214 953 3152) 3615 Tanner Lane Richardson, Texas 75082-2618 USA xml-dev: A list for W3C XML Developers. To post, mailto:[email protected] Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ From [email protected] Wed Feb 24 19:59:34 1999 Date: Wed, 24 Feb 1999 16:50:42 -0600 From: "Steven R. Newcomb" <[email protected]> To: [email protected] Cc: [email protected] Subject: Re: Streaming XML [Jonathan Borden:] > ... this is basic stuff, but the point is to > emphasize that the distinction between what an > object 'does' and what an object 'is' is not so > clearcut. Actually, property sets make it very clearcut. Remember that property sets are not implementation descriptions, whereas UML models are. In property sets there are never any methods whatsoever. This point is emphasized by the fact that, in the grove paradigm, the information components are called "nodes" rather than "objects". If you choose to instantiate a grove as a collection of objects (as many reasonable people, including those at my own company, certainly would), that's OK, but the fundamental abstraction does not have the concept of methods. [Much good stuff from Jonathan Borden omitted, with all points taken.] > In fact, when you get out of the SGML/XML world, > the use of the terms 'property set' and 'grove' > get replaced by terms 'UML', 'persistence' and > 'object model'. What you promise that use of > property sets and grove plans will automate > processing of data and interoperability, CASE > tools vendors promise using UML. What is the > essence of the difference between an information > set and/or property set and/or grove plan versus > UML? I was hoping you would ask this question! Let me begin by oversimplifying: the difference is that you can do much more with UML, and that oversufficiency is precisely UML's deficiency in this problem-space. It is very difficult for people who have made their careers in information processing to perceive the virtue of making a complete distinction between processing and information. Even so, it's of paramount importance to make this distinction, if any of the following statements are true: the information may outlast existing processing systems, the information may have unforeseen uses in an ever-changing world, and the information must be interchanged in an open, multivendor environment. Instead of encapsulating such information in methods, as objects often do, we need to encapsulate it in semantics, as XML can be used to do. Having rendered the information as XML, and having chosen appropriate semantic-bearing tags and other attributes for its various components, we now have the information in a totally useless but highly interchangeable form that can become input to any application for any purpose, including unforeseen purposes. For me, this useless but interchangeable XML form of the information is the form that is most deserving of its owner's respect. It is the owner's best choice of representation as the "maintained source code" of the information asset. It's the form that nobody but the information owner owns or controls. It's the form that no software vendor has a lock on. It's the form that (presumably) has everything needed to reconstitute a useful, application-ready form of the same information asset, regardless of the nature of that application, foreseen or unforeseen. Now let's consider how well-described this XML asset really is. After all, if the asset doesn't have a very accurate description, we can't be sure that unforeseen applications will find the information intelligible. With DTDs, we have a way to model the structural relationships of the elements to each other. But that's not enough to guarantee that the information will be understood in the manner that its architects and creators intended. With various proposed XML schema languages, we can impose lexical typing requirements and certain additional syntactic/structural requirements, but, again, that doesn't guarantee that the information will be understood in the manner that was intended. Neither the DTD nor the schema extensions so far proposed can tell us the information set that is supposed to be derivable from the XML form of the information asset. The information is still not described well enough to allow unforeseen applications, developed by unforeseeable developers, to use the information or to create new but similar information. All of the generic structural/syntactic validation in the world will not guarantee that! This is because the interchangeable form of the information is not the same as the useful form, which we will assume, for purposes of this discussion, is objects that conform to certain classes and have certain constellations of properties and relationships. Now the question becomes, "What defines the data, interrelationships, and semantics of those objects?" The ISO/SGML answer is, "A property set, designed as part of the interchange architecture, that defines the classes of objects that will reflect the quintessential information set conveyed by the resource." The object classes defined by a property set, and the node-objects in the groves that conform to those classes, are strictly the canonical, static result of the processing that is explicitly (but only conceptually) required to be done to all resources that conform to the architecture, before they are used by an application. Conceptually speaking, these "groves" fully respect the characteristics of the interchangeable resource that they represent, including the fact that an interchangeable resource has no methods, and there is nothing dynamic (or even useful) about it when it's in its XML form. A property set is an abstract model of the useful information that can be extracted from an interchangeable resource. There is nothing in a grove that isn't already in the corresponding resource. Property sets are designed to exactly reflect the characteristics of information that can be extracted from information resources. An intelligent person like yourself may remark, "Well, then, I guess the abstract properties of C++ notation must be very complex, because they can describe arbitrarily complex processes." You're right, they are, and the abstract properties of C++ notation can be modeled using the property set paradigm. (And modeling C++ notation would be an interesting exercise, although I'm not yet confident of commercial interest.) A property set for C++ notation might include node classes with such names as "variable name", "passed argument", "operator", "method", "object", "class definition", etc. So why bother with property sets, when UML is more powerful? Because property sets impose the design discipline of focusing on what is being interchanged, rather than on what might be done by particular applications. They force you to focus on the precise nature of the "maintained source code" of the information. They force you to think more abstractly, which can be uncomfortable but is often very worthwhile. They force you to recognize that interchangeable information cannot modify itself, and has no built-in methods. Because property sets are designed to support the addressing of arbitrary components of information, and their nature imposes the discipline of designing for various forms of addressing. Everything that is modeled in a property set can become a node in a grove, and everything that can become a node in a grove is predictably and reproducibly addressable. This means that addresses created and recorded by one application will be understandable and correctly resolvable by other applications. This is the key to the solution of the general hyperlinking problem. If, for example, we're addressing some node by counting other nodes, all of the counted nodes must exist, at least conceptually. > Don't get me wrong, I think the work on > information sets, property sets and groves is > terrific and needs to be continued. One way to > do this is to turn our heads sideways ever so > often to see what collegues in the distributed > object world are doing. These problems are > universal. Very true. But information interchange is a funny thing. XML does not proceed from the study of computer programming. It comes from another direction, and it's a different problem space. Portable-software-ology is a specialized subdomain of, and not the same thing as, portable-information-ology. (I sure wouldn't want to try to support portable information without portable software, though!) At the risk of confusing the reader, let me add that the property set syntax is just one syntax for doing what property sets do, albeit the ISO standard one for doing it. The claim has been made by Eliot Kimber that the STEP schema language, EXPRESS, would do as well or better. I think he's probably right. EXPRESS, however, is a more powerful language that is more demanding to learn. By contrast, the property set syntax is defined as an SGML or XML DTD, and a small and simple one at that. -Steve Steven R. Newcomb, President, TechnoTeacher, Inc. [email protected] http://www.techno.com ftp.techno.com voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137) fax +1 972 994 0087 (at ISOGEN: +1 214 953 3152) 3615 Tanner Lane Richardson, Texas 75082-2618 USA xml-dev: A list for W3C XML Developers. To post, mailto:[email protected] Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 Prepared by Robin Cover for the The SGML/XML Web Page archive. For other references on property sets, see "Groves, Grove Plans, and Property Sets in SGML/DSSSL/HyTime."