Cover Pages: Architectural forms and inheritance/subtyping

A small collection of [XML-DEV] postings on "inheritance," "subtyping," "subclasses," and related notions in XML/SGML/HyTime. Content last updated April 23, 1998.


Date: Thu, 02 Apr 1998 14:18:49 -0600
To: xml-dev@ic.ac.uk
From: "W. Eliot Kimber" <eliot@isogen.com>
Subject: Re: "Inheritance considered harmful"

[Eliot Kimber]

> That's probably because the architecture facility of ISO/IEC 10744 doesn't
>  *do* inheritance in the way that most people seem to expect.

[Paul Prescod]
That's right. That's why people get so confused about them. The word 
inheritance is inherently misleading when applied to architectural forms.

Architectural forms do subtyping, not inheritance. Inheritance is about
"getting stuff for free"  (e.g. code, declarations, fields). Subtyping is
about *fulfilling a particular role* (perhaps through a manual
construction of an appropriate "interface" (in this case a content
model)). Architectural forms allow you to specify an interface that must
be fulfilled and declare conformance to that interface. It does not allow
you to "get code for free" (i.e. markup declarations). 

[Eliot Kimber]

Paul has made clear...

[Steve Newcomb]

Paul is absolutely right, but I'm still not going to take his advice.

For several months last year, I deliberately stopped using the
word "inherit", as in "inheriting architecture", "inherited-from
architecture", etc.  Instead, I very carefully used the words
"derived" for the inheriting architecture and "enabling" for the
inherited architecture.  This is the vocabulary used in the standard.

Ultimately, however, I reluctantly gave up on precision vocabulary
because nobody understood what I was talking about, except for people
whom I had no need to reach because they already understood the
concepts.  In almost all rhetorical situations, I have to use
vocabulary that may be, strictly speaking, misleading, and yet
provides some glimmer of understanding to the HyTime-inexperienced.
I'm back to "inherited" and "inheriting", and I never even try to use
"enabling" and "derived" any more.  I'm open to other suggestions,
though.  Got any?


-Steve

=======================================================================

From owner-xml-dev@ic.ac.uk Tue Mar 31 18:55:23 1998
Date: Tue, 31 Mar 1998 18:43:39 -0500
From: "Steven R. Newcomb" <srn@techno.com>
Subject: Re: Experimenting with Namespaces - DTDs?

David Megginson (ak117@freenet.carleton.ca) writes:

> Personally, I'd recommend architectural forms over namespaces if
> you're concerned with DTDs, since architectural forms have several
> major advantages:

David is right.

But I would go farther: XML Namespaces are a snare and a delusion.
With their use of colon syntax, they lull one into thinking that that
are about class inheritance.  They are not.  Instead, what the
namespace thing does is to collapse all the structure of the classes
of the inherited-from DTD into a salad of element types which is very
correctly termed a namespace rather than an architecture.  All that
RDF was looking for was a way to guarantee global uniqueness of
element type names, and if we ever try to get anything more than that
from namespaces, we are on very thin ice indeed.

If the inherited-from DTD is already a tag salad, in which all the
element types are a big OR group in the content model of the document
element, namespaces can work quite well.  If, however, an element type
has different meanings depending on its context (and most
architectures necessarily have this characteristic), then collapsing
such an architecture into a namespace can actively interfere with
information interchange.

I think RDF would benefit substantially, in terms of its
understandability, its implementability, and its flexibility, if it
were described in terms of inherited architectures.  In fact, I think
it cries out for an architectural perspective, in which the
knowability and significance of element context is preserved.  I
suspect that RDF's formal rigor would benefit, too, even though its
formal rigor is already formidable.  (I'm basically impressed by RDF;
it's the product of much excellent thinking, I think.  I just want
MORE!)

To be entirely fair and truthful, I must personally accept a share of
the blame for this namespace mess; I was present at the first Dublin
Core meeting, and, awed by the momentousness of the occasion, I
evidently failed to make the case for using architectures for
metadata.  My later contributions to the W3C XML discussions about
namespaces were evidently not persuasive, either.  In my own defense,
I would argue that this is entirely understandable; it's a subtle
issue; nobody has much experience with metadata architectures; what
experience there is is dominated by methodologies like MARC that rely
on lists of uniquely named fields; and, most of all, the need for even
a partial solution to the metadata problem is phenomenally intense.

Anyway, all is not lost.  This namespace thing is a mistake that will
necessarily be corrected, simply in order to support the needs of
civilization in an XML-dominated world.  The way toward a solution is
already paved by an ISO standard (ISO/IEC 10744:1997 Annex A.3) that
is being adjusted to accommodate the syntactic limitations of XML
(i.e., its lack of #NOTATION attributes).  It is implemented in the SP
parser and in other software systems, and it is already being used in
many industrial contexts.  It's the right sort of answer, it's not
going away, and its usage is accelerating rapidly; there was a
manyfold increase in the number of papers reporting its use at
SGML/XML 97.

And, anyway, the need for metadata interchange far outstrips RDF's
present scope.  I hope and believe that many powerful metadata
architectures -- including elegant ones that can't be squashed flat
and remain useful -- will be multiply inheritable.  That way, there
can be a marketplace of architectural ideas for metadata in which the
full power of context can be exploited.  I'd like to see RDF evolve
in that direction.

-Steve

--
Steven R. Newcomb, President, TechnoTeacher, Inc.
srn@techno.com  http://www.techno.com

voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax    +1 972 994 0087 (at ISOGEN: +1 214 953 3152)

3615 Tanner Lane
Richardson, Texas 75082-2618 USA

=======================================================================

From owner-xml-dev@ic.ac.uk Fri Apr 17 05:27:53 1998
Date: Fri, 17 Apr 1998 12:12:03 +0200
From: matthew@praxis.cz (Matthew Gertner)
Subject: Inheritance in XML (was Re: Problems parsing XML)
To: xml-dev@ic.ac.uk

I can't resist jumping in at this point, since it reminds me of some
thoughts I had about a topic which was being discussed a couple of weeks
ago: inheritance in XML. Unfortunately I seem to have managed to lose the
original mails (hint: never remove anything from the mail server if you're
not at your primary machine), but the gist was that object-oriented
approaches to inheritance are not applicable to XML because XML, unlike OO
languages, models only data and not behavior. This led into a very
interesting and apt discussion of the difference between inheritance (i.e.
of behavior) and subtyping (e.g. of interfaces).

To say that OO techniques only apply to behavior is an oversimplication.
Some of the basic tenets of OO (encapsulation, polymorphism) are only
applicable when behavior is modelled, but I would maintain that others
(inheritance, identity) are equally applicable to data. The two last
examples would both be of huge benefit to XML and are both currently
lacking.

Eliot Kimber indicated some scepticism as to whether OO techniques have
really lived up to their hype. In terms of a controlled environment, they
have. Any C programmer who has moved onto C++ will attest that OO features
make it far easier to write extensible and maintainable code. On the other
hand, the promise that this would lead to interchangeable components that
could be used anywhere has clearly been a flop. Why? For exactly the reason
Tim mentioned in his mail: interoperable APIs never work. You can't
interface with code and expect this interface to apply to any environment
other than the one it was specifically designed for. This is the case
whatever technology you are using (DLLs, Java, JavaBeans, Smalltalk, COM,
CORBA, etc.). Hence XML.

Nevertheless, inheritance of some sort is absolutely vital if XML is to
fulfill its promise. If we can't produce standard DTDs which can be
extended, *without* modifying the base DTD, then many of the advantages of
XML go out the window. This is as important as, say, linking facilities, and
is certainly orthogonal to the current namespace proposal.

I have been giving quite a lot of thought to how inheritance (I don't really
think sub-typing is the right term) could be implemented for XML. I'll have
to write up the details in a seperate document, as this mail is getting
pretty long. In essence:

1) HyTime provides an extremely valuable and rich basis for this work, just
as it has for XML-Link. However, the relevant aspects need to be extracted
and presented in a more easily digestible form. Also, HyTime attempts to
implement inheritance (of element content) without extending the DTD syntax.
This decision should at least be reevaluated in the context of XML.
2) OO languages provide extensive facilities for inheritance of data members
(quite independently of methods), and these concepts would also be very
valuable in this context.
3) Additional thought must be given to adapting the content model of
existing element types in a base DTD without having to write out a whole new
content model. This is pretty scary, but I imagine it would be possible to
define primitives saying things like:
a) certain new element types can be inserted in front of the existing
content model.
b) certain new element types can be appended at the end of the existing
content model.
c) certain new element types can be inserted at a given location in the
existing content model.
d) etc.

I'd be really interested in reading others thoughts on this matter.

Cheers,

Matthew

-----Original Message-----
From: Tim Bray <tbray@textuality.com>
To: xml-dev@ic.ac.uk <xml-dev@ic.ac.uk>
Date: Friday, April 17, 1998 6:07 AM
Subject: Re: Problems parsing XML


>At 10:35 PM 14/04/98 -0500, len bullard wrote:
>>> [Chris Maden <crism@ora.com>:]
>>> > One fundamental flaw in _XML Complete_ is Holzner's apparent belief
>>> > that you must write Java code in order to do anything useful with
>>> > XML.
>
>>Markup doesn't care.  That's the beauty of it. :-
>
>Yes! What he said.  As a result of having been a programmer since
>A.D. 1979, my faith in interoperable APIs is torn and shredded.
>But I think that interoperable syntax is usefully achievable.
>Hence, XML. -T.



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


=======================================================================

From owner-xml-dev@ic.ac.uk Fri Apr 17 05:55:32 1998

Date: Fri, 17 Apr 1998 11:46:40 +0100
From: Michael Kay <M.H.Kay@eng.icl.co.uk>
Subject: Re: Inheritance in XML (was Re: Problems parsing XML)
Sender: owner-xml-dev@ic.ac.uk
To: xml-dev@ic.ac.uk

Matthew Gertner:
>Some of the basic tenets of OO (encapsulation, polymorphism) are only
>applicable when behavior is modelled, but I would maintain that others
>(inheritance, identity) are equally applicable to data. The two last
>examples would both be of huge benefit to XML and are both currently
>lacking.

I agree absolutely. I have found identity and subtyping to be the two
biggest
benefits in using an object database over a relational database.

>Nevertheless, inheritance of some sort is absolutely vital if XML is to
>fulfill its promise. If we can't produce standard DTDs which can be
>extended, *without* modifying the base DTD, then many of the advantages of
>XML go out the window.

I agree that this is central. Let's leave identity out of the discussion, as
that
does, I think, fall into the XML Linking domain, and concentrate on what I
prefer to call subtyping.

There's a lot of stuff in the SGML culture that one could fall back on:
architectural forms etc, but I for one find it extremely arcane and
difficult
to relate to my own domain of object modelling and database design,
which I think is familiar to a much wider community.

I know some people will disagree, but the way I use XML, a DTD is a
schema, an element definition in a DTD is a class, a document is a
database, and an element within a document is an instance of a class.
What is missing is that we can't define one class (element type) as a
subtype of another.

Since we are only concerned with structural subtyping and not with
behaviour, I don't think it would actually be difficult to define this
concept.
The main thing that's tricky is that you can get the "is-a" the wrong way
round. If a PREFACE is-a-kind-of CHAPTER, that means you can find
anything (elements, attributes) in a PREFACE that you can find in a chapter,
and more besides. It also means you can reduce a PREFACE to a CHAPTER
by removing these extra bits. I'm not entirely sure what "removing the extra
bits" means: for example should it remove elements that cannot occur
in a CHAPTER, or should it just remove the tags that surround those
elements? This tends to show up the lack of semantics in the object
model underlying XML.

Just some thoughts...
Mike Kay, ICL


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

=======================================================================


From owner-xml-dev@ic.ac.uk Fri Apr 17 10:31:59 1998
Date: Fri, 17 Apr 1998 10:18:30 -0500
To: <xml-dev@ic.ac.uk>
From: "W. Eliot Kimber" <eliot@isogen.com>
Subject: Re: Inheritance in XML (was Re: Problems parsing XML)


At 12:12 PM 4/17/98 +0200, Matthew Gertner wrote:

>1) HyTime provides an extremely valuable and rich basis for this work, just
>as it has for XML-Link. However, the relevant aspects need to be extracted
>and presented in a more easily digestible form. Also, HyTime attempts to
>implement inheritance (of element content) without extending the DTD syntax.
>This decision should at least be reevaluated in the context of XML.

I appreciate the vote of confidence for architectures and hesitate to make
the next comment. However, there appears to be a general misconception
about architectures that I feel I must attempt to correct, to wit, that
architectures have ANYTHING to do with inheritance.

Mathew says "HyTime attempts to implement inheritance (of element content)
without extending the DTD syntax".

This is a false statement because HyTime DOES NOT ATTEMPT to define any
form of inheritance as I understand that word.  Therefore, it is not a
failing of the AFDR that it did not extend DTD syntax (which was never a
realistic option at the time it was designed). The decision that was made
was the only possible decision at the time.

This is not to say that I object to the idea of true inheritance in SGML. I
do not. It would almost certainly be a useful facility, making the use of
architectures at least easier, if not more powerful as well. So I
appreciate the depth of thought that is being and will be put into this
issue. I simply object to the suggestion that there is anything wrong with
architectures as they stand because they fail to provide a proper or useful
inheritance mechanism. Architectures cannot fail at something they
explicitly don't try to do. I don't want people to think that they
shouldn't use architectures because they don't do inheritance.
Architectures aren't about inheritance--they are orthoganal but synergistic
concepts.

The *processing effect* of using architectures may *appear* to be
inheritance, but that is a side effect of the type of processing that
architectures enable, not a direct intent of the architectural mechanism.
Or, said another way, architectures were designed to enable object-oriented
*processing* but not object-oriented construction of instance DTDs for
enabling parsing and validation. The latter simply isn't a requirement for
the former and is orders of magnitude harder to invent, specify, and
implement.

Remember: DTDs exist for exactly two reasons:

1. To enable *syntactic* validation of instances
2. To enable the use of markup minization features

For all other types of processing DTDs are *irrelevant*. Thus, you do not
need to think about DTDs at all in order to enable object-oriented
*processing*, which is one of the things architectures do.  Architectures
also enable the syntactic validation of documents against the architectural
syntax rules (the architectural DTD), but they do not need to provide an
"DTD inheritance" mechanism in order to do that--they simply need to enable
the automatic generation of new instances that conform to the architectural
DTD.  This is a pretty trivial thing to define and implement (modulo the
optional automapping facility, which, like any markup minimization feature,
complicates things a bit).

It might help to understand why architectures are designed the way they are.

Architectures are designed to give you a way to define a set of general
rules for processing documents for some specific purpose (e.g.,
hyperlinking, defining metadata, etc.). Document instances use these rules
by reference by asserting derivation from the architecture and conformance
to its rules.  

Because SGML can only talk about syntactic rules and because the
architecture mechanism uses SGML syntax as the base definition of its
rules, these sets of rules provide an ability to define syntactic
constraints in way that is similar or identical to those provided by a
document's private DOCTYPE declaration.  At the same time, these rules do
not impose any requirements on the names used in instances, because
avoidance of name-space incursion is a basic principle of SGML and its
related standards. Thus, a general set of rules define a set of types that
instances assert conformance to, rather than defining the instance types
directly.  Note that architectures presume additional definitions beyond
the architectural DTD but cannot, of course, define how these rules might
be specified (because there are an infinite number of useful ways to do so).

Note that the direction of pointing is from instances to types to establish
an is-a or kind-of relationship.  This is merely an *assertion* made by
element *instances* (not types). This means that there is no, I repeat, no
connection between element type declarations and architectural types
("forms"), except that the markup minization feature of fixed attributes
lets you fix the mapping for instances at part of an element type
declaration.  But it is not meaningful to say that an element type conforms
to an architectural form--only instances can conform. This further suggests
that what architectures do is not inheritance because instances do not
inherit properties from other types, they are simply instances of types.
Architectures do not define any notion of types being derived from types.
[The derivation of one architecture from another is really the derivation
of architectural *instances* from another architecture, not derivation of
the architecture. This truth is obscured by the fact that architectural
instances are normally only transient objects used by processors and not
literally instantiated as SGML documents.]

In addition, the rules defined by an architecture need not cover the
entirety of an instance. The HyTime architecture, for example, only covers
those parts of documents involved with linking and addressing. Therefore,
the mechanism must be flexibile enough to allow both different elements of
diffent types in the same document to be derived from different
architectures and a single element to be derived from different
architectures at once. 

Because each architecture defines a distinct "processing context", there is
no problem in having a single element derived from multiple architectures
because the processing for each architecture is independent of the
processing for any other architecture.  There is no "multiple inheritance"
problem because it's not inheritance.  It's no different from me saying
that I conform to the rules for both male humans and licensed drivers.
These are distinct rule domains and as long as the rules for conformance to
both do not result in a conflict such that I can't satisfy both at once,
there are no problems. [For example, I could also say that I can conform to
the rules for licensed drivers and medical cadavers but I obviously can't
do both at the same time, because being a cadaver includes a requirement
that makes it impossible for me to conform to the rules for licensed drivers.]

Note that the assertion made by elements that they conform to a given form
is NOT saying "instance element X inherits the *syntactic* properties of
architectural form Y". It is saying "instance element X *conforms to* the
syntax and semantics of architectural form Y".  It is an assertion of
conformance or derivation that does not have any implications about the
content model of the instance except that it must *allow* (but not
necessarily require) instances that conform to the architectural content
rules.  The only constraints architectural content models impose on
instances is the requirement for *potential* conformance. But instances are
free to allow content that would not conform, because not all instances
will be processed or validated with respect to a given architecture.

[There may, however, be a definite processing result that looks or in fact
is inheritance, but that's inheritance of processing, which is different
from inheritance of local syntactic rules. Object-oriented techniques are a
natural way to implement processing because you can reflect the *taxonomic*
hierarchy represented by an architecture with programmatic objects.]

For example, say I define an architecture for sections in technical manuals
with the following architectural content model:

<!-- Section architecture: -->
<!ELEMENT Section
  (Title,
   (Para+ |
    Section+))
>
<!-- Another form that is not allowed within Section -->
<!ELEMENT Intro
  (Para+)
>

<!-- End of architecture -->

In a document, I might have this element type, instances of which can be
derived from the Section form:

<?XML version="1.0" ?>
<!DOCTYPE Division [
<?IS10744:arch name="Sectarch" ... ?>
<!ELEMENT Division
  ((Title | Metadata),
   (Para+ |
    (Intro,
     Division+) |
    Division+))
>
<!ATTLIST Division
   sectarch 
      (Section)
      #IMPLIED
>
]>
<Division sectarch="Section">
 <!-- This Division claims conformance but fails to conform 
      because the Section architectural element does not
      all the Intro architectural element in its content.  
   -->
 <Metadata>...</Metadata>
 <Intro>
  ..
 </Intro>
 <Division>
  ...
 </Division>
</Division>

This document is valid with respect to its own rules. It should be clear
from inspection that it allows instances that conform to the Section
architecture. 
It also allows instances that do not conform. It should also be clear that
the instance does not conform to the Section architecture (even though it
asserts conformance by asserting derivation from the Section form).

Thus, given an architectural element type, there is no way to predict the
content models of conforming instances except to say "it will probably
*allow* conforming instances*.  Note that given an architectural element
type, it is probably easy to *generate* instance content models that will
ensure conformance (e.g., just copy the architectural declarations into the
instance and change the names, if desired), but combining two or more forms
from different architectures into a single element type probably cannot be
done programmatically in any satisfactory way because too many arbitrary
decisions will have to be made, possibly based on variables that can only
be understood or provided by humans (such as when are instances expected to
be validated against a particular architectural derivation).

It should be clear that any notion of true inheritance of content models
from architectures to instances is problematic at best, provably impossible
at worst.  

In addition, it would require that the instance parser have access to all
architectural DTDs and be able to synthesize them according to some set of
combinatorial heuristics. To my mind, this is a level of processing
overhead that is unacceptably high if all conforming parsers must support
it. In particular, it seems to be directly at odds with at least one of
XML's basic principles (actually, I can think of at least three: enabling
small parsers, no options, simplicity of specification).

By constrast, you only need to access and use an architectural DTD when you
are *validating* with respect to that architecture, which is always an
option. Validation is not a requirement for doing architecture-aware
processing. A processor for any given architecture presumably has built-in
knowledge of the forms in that architecture. In any case, DTD's only enable
validation and parsing, not processing, so they are largely irrelevant to
the issue of enabling *processing*, which is the primary purpose of
architectures. Thus, the use of architectures imposes *no requirements* on
instance parsers to do anything more than they have to do today. Validating
with respect to an architecture is a choice that users of documents get to
make.

But, doing such combination in some non-SGML schema syntax is perfectly
reasonable to contemplate because at that point you've gone outside the
minimum requirements of SGML parsing and by definition there is no
requirement that any conforming instance parser do any processing with
respect to non-SGML-syntax schemas.

Cheers,

Eliot
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

=======================================================================

From owner-xml-dev@ic.ac.uk Fri Apr 17 10:56:11 1998
Date: Fri, 17 Apr 1998 11:52:15 -0400
From: Frank Manola <fmanola@objs.com>
Subject: Re: Inheritance in XML (was Re: Problems parsing XML)

At 8:55 PM -0700 4/16/98, Tim Bray wrote:
>At 10:35 PM 14/04/98 -0500, len bullard wrote:
>>> [Chris Maden <crism@ora.com>:]
>>> > One fundamental flaw in _XML Complete_ is Holzner's apparent belief
>>> > that you must write Java code in order to do anything useful with
>>> > XML.
>
>>Markup doesn't care.  That's the beauty of it. :-
>
>Yes! What he said.  As a result of having been a programmer since
>A.D. 1979, my faith in interoperable APIs is torn and shredded.
>But I think that interoperable syntax is usefully achievable.
>Hence, XML. -T.
>

and Matthew Gertner wrote:
>Eliot Kimber indicated some scepticism as to whether OO techniques have
>really lived up to their hype. In terms of a controlled environment, they
>have. Any C programmer who has moved onto C++ will attest that OO features
>make it far easier to write extensible and maintainable code. On the other
>hand, the promise that this would lead to interchangeable components that
>could be used anywhere has clearly been a flop. Why? For exactly the reason
>Tim mentioned in his mail: interoperable APIs never work. You can't
>interface with code and expect this interface to apply to any environment
>other than the one it was specifically designed for. This is the case
>whatever technology you are using (DLLs, Java, JavaBeans, Smalltalk, COM,
>CORBA, etc.). Hence XML.

These observations about the (at least so far) lack of success with truly
interoperable APIs are certainly true, and the potential of interoperable
syntax "feels" right, but I wonder to what extent we may be comparing
apples and oranges here.  Specifically, what do we mean by "interoperable"?
Interoperable APIs are hard at least in part because an incredible amount
of semantics are (implicitly) built into a typical API (as is suggested by
Matthew's comment).  Moreover, interoperable APIs are held to a "strict
accountability":  the programs interacting through them must work without
either syntactic or semantic errors (and, with programs, these are
typically all bundled up).  However, if programs must agree on the precise
meanings of tagged data in order to guarantee proper operation when
exchanging data (and what else does a fair understanding of "interoperable"
mean in this context?), won't the semantics that must be mutually
understood be (approximately) just as complex?  And don't we then need to
consider the mechanism(s) for achieving *that* in our comparisons?  After
all, it's not enough that the programs be "interoperable" in the sense that
they can each "operate" (e.g., read, parse, or even approximately get the
meaing) on the other's data;  the operation must also be "correct" in a
fairly constrained sense.  I have in mind all the problems large companies
are having merging data from different databases into data warehouses due
to sometimes subtle differences in semantics (e.g,, of what a "customer"
is), even when the data item names (corresponding to markup) are the same
(or, at least, fairly regular).  I'm not, here, arguing *against* the idea
of interoperable syntax, but I am questioning how easy it will really be to
get the degree of "interoperability" we seem to be implicitly expecting.

--Frank

-----------------------------------------------------------------------
Frank Manola                            www:    http://www.objs.com
Object Services and Consulting, Inc.    email:  fmanola@objs.com
151 Tremont Street #22R                 voice:  617 426 9287
Boston, MA  02111  

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


=======================================================================

Date: Fri, 17 Apr 1998 13:23:37 -0400
From: Paul Prescod <papresco@technologist.com>
Subject: Re: Inheritance in XML (was Re: Problems parsing XML)
Sender: owner-xml-dev@ic.ac.uk
To: xml-dev@ic.ac.uk

Matthew Gertner:
>Nevertheless, inheritance of some sort is absolutely vital if XML is to
>fulfill its promise. If we can't produce standard DTDs which can be
>extended, *without* modifying the base DTD, then many of the advantages of
>XML go out the window.

Michael Kay wrote:
> 
> I agree that this is central. Let's leave identity out of the discussion, as
> that
> does, I think, fall into the XML Linking domain, and concentrate on what I
> prefer to call subtyping.

You act as if this is just a terminological difference, but it isn't. He
is talking about one thing and you are talking about another. He speaks
of "Producing standard DTDs which can be extended *without* modifying
the base DTD" is inheritance. It can be implemented right now through
parameter entity hacks and is not subtyping. You on the other hand seem
to be talking about subtyping:

> I know some people will disagree, but the way I use XML, a DTD is a
> schema, an element definition in a DTD is a class, a document is a
> database, and an element within a document is an instance of a class.
> What is missing is that we can't define one class (element type) as a
> subtype of another.

The only reason that the concepts *even intersect* is because 

a) subtyping without inheritance is often painful and leads to code
duplication. I claim that architectural forms and Java "interfaces" are
often painful for exactly this reason. Of course in [SG|X]ML,
inheritance can be hacked with parameter entities, which is something
HyTime does for its architectures. (also HyTime can only be thought of
as subtyping if you use it in a restricted form...)

b) inheritance without subtying is only occasionally useful. I can't
remember the last time I used "private inheritance" in C++ and I don't
even remember right now if Java supports it.
 
But the fact that the two concepts work well together does not make them
synonyms. They are not.

> The main thing that's tricky is that you can get the "is-a" the wrong way
> round. If a PREFACE is-a-kind-of CHAPTER, that means you can find
> anything (elements, attributes) in a PREFACE that you can find in a chapter,
> and more besides. 

No it doesn't. If PREFACE is-a-kind-of CHAPTER then source code designed
to handle chapters should work with prefaces. That means that PREFACE
must either directly describe a *subset* of the language described by
CHAPTER (i.e. have a constrained content model) or PREFACE must provide
"some mechanism" for transforming its content into a language
understandable by CHAPTERs. In real world documents, we often want to be
able to have subtypes that are also extensions, which means that we need
to define some transformational system (as archforms do).

This transformational question is exactly what makes subtyping with
extension very tricky. Subtyping without extension is trivial. This is
why I have stepped back from the question of subtyping with extension
and am investigating transformation languages. In particular I am right
now looking at Forest Automata theory and a transformation language
designed by Makato Murata.

> It also means you can reduce a PREFACE to a CHAPTER
> by removing these extra bits. I'm not entirely sure what "removing the extra
> bits" means: for example should it remove elements that cannot occur
> in a CHAPTER, or should it just remove the tags that surround those
> elements? This tends to show up the lack of semantics in the object
> model underlying XML.

That's exactly right. Your confusion is my confusion. The only way out
is through transformation languages -- either simple, relatively weak
ones like those provided by archtiectural forms, or more powerful (and
more complicated? I don't know yet?) ones like those described by
Murata-san in his various Principles of Documentation papers. They are
at:

http://www.geocities.com/ResearchTriangle/Lab/6259/

Unless you are much smarter than me, you will probably not find these
light reading, but my hope is that the concepts can be simply expressed
in a nice syntax in much the same way that regular expressions hide the
nastiness of DFAs. There is in fact such a thing as a regular tree
expression that is quite analogous to a regular expression. I don't yet
know if these can be hooked up to an easy to use (non-programmable!)
transformation language yet.

Sorry for the brain dump. I'm late for a meeting.

Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

[Woody Allen on Hollywood in "Annie Hall"]
Annie: "It's so clean down here."
Woody: "That's because they don't throw their garbage away. They make 
        it into television shows."

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


==========================================================================

From: "Martin Bryan" <mtbryan@sgml.u-net.com>
To: <xml-dev@ic.ac.uk>
Subject: Re: Inheritance in XML (was Re: Problems parsing XML)
Date: Sat, 18 Apr 1998 08:49:07 +0100
Message-ID: <01bd6a9e$92ae10a0$2b8577c2@sgml.u-net.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 4.71.1712.3
X-MimeOLE: Produced By Microsoft MimeOLE V4.71.1712.3
Sender: owner-xml-dev@ic.ac.uk
Precedence: bulk
Reply-To: "Martin Bryan" <mtbryan@sgml.u-net.com>
Status: R


Michael Kay wrote:
>I know some people will disagree, but the way I use XML, a DTD is a
>schema, an element definition in a DTD is a class, a document is a
>database, and an element within a document is an instance of a class.
>What is missing is that we can't define one class (element type) as a
>subtype of another.

In SGML you can use exclusions to make an element a true subclass of
another:

<!ELEMENT X  (%Y-contents;) -(a|b|c)>

providing a, b and c are optional components within the model for Y.
Unfortunately XML dropped this useful option from the set of SGML facilities
it in inherited

Martin Bryan

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

==========================================================================

Date: Sat, 18 Apr 1998 10:31:48 -0400
From: Paul Prescod <papresco@technologist.com>
X-Mailer: Mozilla 4.04 [en] (WinNT; U)
MIME-Version: 1.0
To: xml-dev@ic.ac.uk
Subject: Re: Inheritance in XML (was Re: Problems parsing XML)
References: <01bd6a9e$92ae10a0$2b8577c2@sgml.u-net.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-xml-dev@ic.ac.uk
Precedence: bulk
Reply-To: Paul Prescod <papresco@technologist.com>
Status: R

Martin Bryan wrote:
> 
> 
> In SGML you can use exclusions to make an element a true subclass of
> another:
> 
> <!ELEMENT X  (%Y-contents;) -(a|b|c)>
> 
> providing a, b and c are optional components within the model for Y.

Element X is not a true subclass or subtype. Given a content model:

<!ELEMENT J (Y)>

You cannot use an X.

 What you've done above is make an element whose content model is more
restrictive than some other content model. You can also do that without
exclusions. I don't think I've ever used exclusions in that way. One big
problem is that the exclusion doesn't just change the content model, but
the content model of all of X's children. You don't want that if all you
need is content model subsetting.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"Journalism is good if you follow the rules. Don't allow the human 
rights groups to spoil your profession" 
    - Col. Godwin Ugbo of the  Nigerian military dictatorship

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

==========================================================================

From owner-xml-dev@ic.ac.uk Sat Apr 18 12:21:12 1998
Received: from bowmore.cc.ic.ac.uk (bowmore.cc.ic.ac.uk [155.198.5.22])
        by ACADCOMP.SIL.ORG (8.8.5/SIL-1.0) with SMTP id MAA24756
        for <robin@acadcomp.sil.org>; Sat, 18 Apr 1998 12:21:10 -0500 (CDT)
Received: from majordom by bowmore.cc.ic.ac.uk with smtp (Exim 1.58 #2)
	id 0yQb9p-00009I-00; Sat, 18 Apr 1998 18:12:05 +0100
Received: by ic.ac.uk (bulk_mailer for ic.ac.uk v1.7); Sat, 18 Apr 1998 18:12:01 +0100
Received: from majordom by bowmore.cc.ic.ac.uk with local (Exim 1.58 #2)
	id 0yQb9d-00008e-00; Sat, 18 Apr 1998 18:11:53 +0100
Received: from punch.ic.ac.uk [155.198.5.17] 
	by bowmore.cc.ic.ac.uk with smtp (Exim 1.58 #2)
	id 0yQb9Q-00008A-00; Sat, 18 Apr 1998 18:11:40 +0100
Received: from ACADCOMP.SIL.ORG [208.145.80.4] 
	by punch.ic.ac.uk with smtp (Exim 1.62 #1)
	id 0yQb9P-0001h3-00; Sat, 18 Apr 1998 18:11:39 +0100
Received: (robin@localhost)
        by ACADCOMP.SIL.ORG (8.8.5/SIL-1.0) id MAA24752
        for xml-dev@ic.ac.uk; Sat, 18 Apr 1998 12:18:44 -0500 (CDT)
Date: Sat, 18 Apr 1998 12:18:44 -0500 (CDT)
From: Robin Cover <robin@acadcomp.sil.org>
Message-Id: <199804181718.MAA24752@ACADCOMP.SIL.ORG>
To: xml-dev@ic.ac.uk
Subject: Re: Inheritance in XML
Sender: owner-xml-dev@ic.ac.uk
Precedence: bulk
Reply-To: Robin Cover <robin@acadcomp.sil.org>
Status: R

> Re: Subject: Re: Inheritance in XML (was Re: Problems parsing XML)
> Date: Sat, 18 Apr 1998 08:49:07 +0100
> Reply-To: "Martin Bryan" <mtbryan@sgml.u-net.com>

>>What is missing is that we can't define one class (element type) as a
>>subtype of another.

> In SGML you can use exclusions to make an element a true subclass of
> another:
> 
> <!ELEMENT X  (%Y-contents;) -(a|b|c)>
>
> providing a, b and c are optional components within the model for Y.
> Unfortunately XML dropped this useful option from the set of SGML facilities
> it in inherited
> 
> Martin Bryan

Martin, I wish I could believe this were true and useful.  It seems
that we confront here one of the several troublesome mismatches
between OO database modeling and SGML/XML markup, with respect to
the simple analogy:

OODB           SGML/XML Markup

class defn     element declaration
class name     element type
object         element
attribute      attribute

If we accept this crude analogy, and accept SGML's notion of an
"attribute" as a name-value pair, then the hope of creating subclasses
through SGML/XML element declarations appears slim.  Appears "to me" I
should say: I would welcome comments from the experts.

For starters, subclassing normally would mean further specialization
by the addition (possibly 'plus subtraction') of properties, viz., of
attributes.  Formally, then, an SGML element declaration can't do the
work: it would need to be an ATTLIST declaration.  But then we face
the problem that you can't model a complex attribute with the SGML
'attribute' anyway (if you want any validation): the "value" in
'(name-)value' is a flat/string in SGML, at least in the literal sense.

Of course, one can (and we all do) model "real" attributes using SGML
elements -- since we have no realistic alternative -- but that creates
other problems for the notion of using SGML element decls as a
subclassing mechanism.  One such problem is that (real) attributes are
unordered.  The straightforward way to model an object/element with
(some optional, some required) attributes a, b, c, d, e, and f would
seem to be: (a* & b? & c? & d & e?), but SGML/XML notions of
prescribing order in the serialization are fairly strong, and XML
won't even allow the use of the AND connector to indicate what I
plainly mean in this sample assertion. (Perhaps Steph Tryphonas has
written a program by now to convert all content models using AND to
use only OR, without sacrificing any integrity constraints on
occurrence and sequence).  In any case, the impulse toward
serialization in SGML -- at least in practice, given tools that force
end users to reckon with (arbitrary non-intuitive) "order" based upon
sequence rules in content models -- tends to work against the easy use
of SGML elements to model attributes.

Even apart from these mismatches between "object" modelling
and SGML/XML encoding, I question whether

  " <!ELEMENT X  (%Y-contents;) -(a|b|c)> "

creates a useful "true subclass."  Why would one want to create a
subclass based upon the subtraction of optional "attributes"
(subelements)?  I think that would make it a superclass in many OO
systems.  In this connection, one might be inclined to argue that the
treatment of "content" as a special attribute is unfortunate, at least
from the perspective of data modelling, where "part-whole" has no
quintessential role vis-a-vis "is-a" or "has-a" or "kind-of" or
"points-to"...  At which point, others would quickly point out that
they think it's specious to be talking about object modeling in terms
of SGML-based markup languages anyway, since "these languages can
neither formally express nor enforce semantic integrity constraints
which are so critical to good object modelling..."

I think this all leads me in the direction of favoring the efforts
at defining other schema languages (beyond SGML/XML DTD syntax),
granting that the validation of instances against their schemas,
if/when critical, will need to be done outside the framework of
the SGML/XML "parser/processor" as defined.  I have little doubt
that someone as brilliant as Eliot can show how the desired
objectives might be met through architecture processing by an
appropriate architecture engine; I don't know whether this is the
"best" path in all cases, or whether SGML/XML users will want to
deal with all the layers of indirection that architectures seem to
want.

I hope that experts with some years of experience in OO systems
will contribute their insights to the new "schema" projects.

-rcc

-------------------------------------------------------------------------
Robin Cover                    Email: robin@acadcomp.sil.org
6634 Sarah Drive           
Dallas, TX  75236  USA          >>> The SGML/XML Web Page <<<
Tel: +1 (972) 296-1783 (h)     http://www.sil.org/sgml/sgml.html
Tel: +1 (972) 708-7346 (w)
FAX: +1 (972) 708-7380


=========================================================================


From owner-xml-dev@ic.ac.uk Mon Apr 20 04:30:50 1998
From: matthew@praxis.cz (Matthew Gertner)
Subject: Re: Inheritance in XML
Reply-to: matthew@praxis.cz (Matthew Gertner)


Robin,

You really hit the nail on the head with this post! These are exactly the
kinds of issues that I was having some trouble expressing in my previous
mail. I have read this thread with great interest, and it seems to me that
if we synthesize the discussion we are getting close to the heart of the
matter. Here is my attempt:

* Terminology *

I personally don't agree that there are carved-in-stone, well-understood
definitions for terms like "inheritance" and "subtyping" in XML. While there
surely
are in certain, specific contexts, we are talking about something new, i.e.
inheritance in XML, and what we really need to do is chose a term and define
it precisely. Does HyTime model inheritance? It does if my definition of
inheritance in XML corresponds to what HyTime does (it doesn't: see below).
Is
"subtyping" a better term. No, because it doesn't have the same resonance as
the word "inheritance" among non-programmer types.

I'll make a first attempt:
"Inheritance in XML refers to the process of creating new element types that
duplicate the content model and attribute list of existing element types (in
the same or a seperate "base" DTD), while extending these to include
additional attributes and/or content. As such, instances of the new element
types can be used wherever the base element type can be used, and can be
processed polymorphically by any external processor which knows about the
base element type."

* HyTime *

I read through Eliot's post and understood some of it. :-) I never meant
to question any design decisions made in the specification of HyTime. They
are all well-justified in the context which prevailed at the time. Despite
the fact that HyTime models derivation (I'll stay away from the i-word in
light
of the definition given above) of
instances and not of schemata, it remains one of the few attempts that have
been made at deriving document types and as such is an extremely valuable
basis for the thinking about a true inheritance mechanism for XML. To meet
the definition I proposed above, this mechanism would have to extend the DTD
syntax or create a new one (see below). The goals and uses of HyTime
derivation are and will continue to be somewhat different from this; I was
only trying to point out that we can benefit greatly from the experience
gained from HyTime in thinking about XML inheritance.

* Semantics and XML *

In last month's Wired, XML made it into the "hype list" with the comment
that we crazy XML types are kidding ourselves because XML will never fly
without well-defined semantics. These sentiments were echoed by several
posts on this list. I agree 100% percent, but as several people pointed out,
there are already a lot of semantics associated with XML, to the extent that
there are semantics associated with the idea of a hierarchy and with the
HAS-A relationship. XML-Link and XSL introduce a very valuable additional
set of semantic relationships. We are all so excited about XML, as opposed
to Excel files, Postscript or what have you, because there are tools like
XML parsers, editors and browsers which have value across the whole range of
XML applications.

I can write an XML file, and to the extent that existing semantics are
sufficient, I can do useful work with this file. I can, for example, display
it as a hierarchy. I can't do anything at all with an Excel file unless I
have Excel. This doesn't eliminate the need to define the specific
semantics of a given schema. This can only be done with clear documentation,
as Paul pointed out. What we can do is capture the semantics expressed in
this documentation and use them as the basis for new schemata. Sure, a lot
of this can be done using "parameter-entity hacks", or by writing content
models out by hand, but this isn't going to be an effective way to bring XML
to the masses.

The whole discussion about XML semantics is very apt in this context
precisely because inheritance is so important for making XML really useful.
Let me give an example implied by Peter (in reference to the agglutination
of DTDs for nuclear power plant software). Let's say that I am developing an
advanced medical diagnosis system based on chemical analysis of blood
samples. Part of the application is a hardware device which looks for
specific molecules in the sample and displays them on a monitor in 3D. I
decide to use CML to model these molecules, but I need to add additional
attributes and content to the molecule description which are specific to my
application. With the kind of inheritance mechanism I am talking about, I
could download a CML viewer and use it "out of the box" to display the
molecules, while still passing the entire XML structure (with my additional
information) to the application with attempts to create a diagnosis. Without
XML inheritance, I will probably "break" the viewer, so I find myself wading
through and adapting a lot of Java code. At this point I start wondering why
I decided to use XML in the first place...

* DTDs and schemata *

Francois Chahuneau's article makes a very effective argument for why we need
to extend or replace DTD syntax (thanks Robin). XML-Data is a reasonable
attempt to do so, but it is understandly controversial because it is a such
a radical departure from the existing syntax. I quite like the idea of an
alternate, XML-based schema syntax, but the real lesson of XML-Data is that
creating an effective inheritance mechanism isn't rocket science. All that
is really needed is a keyword that says "this element type is derived from
that element type". Something like:

<!element dog extends animal...

...where the subsequent content model and attribute list are understood as
being extensions to those of the base element type. The only other issue is
whether more complex handling of the context model is needed.

* Content model *

XML-Data (if I understand correctly) simply tacks any new content for a
derived element type at the end of the base content model. A valid question,
addressed briefly in my previous post, would be whether more robustness is
needed in modifying the existing content model. Steve and Robin both
mentioned this aspect as well; one of the most powerful features of
SGML/XML, as compared with OO languages, is the fact that content is
ordered. It would be nice, therefore, to take this into account in any
putative inheritance mechanism. Things like SGML exclusions don't fit the
above-mentioned definition of inheritance, for the reasons mentioned by
Robin (and others) in his post.

Having given this some more thought, I don't see any practical way to insert
new content in the middle of an existing content model. Maybe someone
cleverer than I has an idea about how this might be done (and whether it is
really useful). In the meantime, one useful approach might be to at least
enable new content to be added at the beginning of the base content model by
adding a #BASECONTENT keyword which is replaced by the base content model in
the derived element type description:

<!element dog extends animal (breed,#BASECONTENT,fleas*)>

This would simply mean that the breed element precedes the content of the
base element type, which is then followed optionally by some flea elements.
This approach is probably sufficient, since other modifications to the base
content model could be taken into account in the design phase of the base
schema (i.e. by breaking up monolithic elements, if necessary).

* What now? *

More tricky than any of these technical issues is the question of what, if
anything, could be done to promote a mechanism of this sort. Obviously this
would require a change to the XML spec as well as modification to all
existing tools which process DTDs, so it's a pretty big deal. I wonder if
anyone besides me thinks that a simple mechanism like this would make sense.
If so, is there any room in the XML standards process to discuss a change of
this type at some point in the future (certainly not for XML 1.0)?

Cheers,

Matthew

-----Original Message----- [deleted]


=========================================================================

From owner-xml-dev@ic.ac.uk Mon Apr 20 08:19:24 1998
Date: Mon, 20 Apr 1998 09:04:57 -0400
From: Paul Prescod <papresco@technologist.com>
Subject: Re: Inheritance in XML

Matthew Gertner wrote:
> 
> * Terminology *
> 
> I personally don't agree that there are carved-in-stone, well-understood
> definitions for terms like "inheritance" and "subtyping" in XML. 

I don't think that anyone claimed that there is a well-understood
definition for "inheritance" in any context -- even OO. But to be
consistent with English, it must have something to do with "getting
something for free." In the XML context the most obvious thing would be
declarations.

Subtyping is different. Subtyping comes straight from mathematics and is
as old as logic (at least). A type defines a set of objects. A subtype
describes a subset of those objects. Simple and precise.

> Is
> "subtyping" a better term. No, because it doesn't have the same resonance as
> the word "inheritance" among non-programmer types.

I don't know why you think that. Non-programmer types are likely to balk
at either word, but at least subtyping is shorter, and can be precisely
defined. Anyhow, it is not at all like the words are interchangable. You
can't pick and choose from words that already have meanings.

> I'll make a first attempt:
> "Inheritance in XML refers to the process of creating new element types that
> duplicate the content model and attribute list of existing element types (in
> the same or a seperate "base" DTD), while extending these to include
> additional attributes and/or content. As such, instances of the new element
> types can be used wherever the base element type can be used, and can be
> processed polymorphically by any external processor which knows about the
> base element type."

ACK! This definition was proven inadequate in the OO software world
around a decade ago. Both C++ and Java allow subtyping without
inheritance, and C++, Sather and Eiffel allow inheritance without
subtyping (I suppose to get that in Java, you would have to use
delegation). If we are going to borrow ideas from OO, then we should at
least use the updated, modern ideas, not those that were accidently
confused in Simula 67 (and have been confused in programmers minds ever
since).

The first major problem with your definition actually has nothing to do
with the inheritance/subtyping conundrum. The biggest problem is that if
you "extend" a content model, you are making a more flexible language,
which *cannot* be processed polymorphically by an external processor
which knows nothing about the base element type:

<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT MY-TITLE (#PCDATA|IMG|FOO|BAR)>

Now imagine software that generates a TOC from titles, presuming them to
be strictly textual. What does it do with images in titles?

Now let's talk about inheritance and subtyping. This is not a merely
theoretical issue. It has important practical implications. The most
interesting, important application of subtyping is allowing divergent
evolution of compatible schemas. This is why architectural forms were
invented. But for this to work, subtyping *must* be unhitched from
inheritance.

Suppose that Boeing has a content model:

<!ELEMENT AIRPLANE-DOC - - (FRONT, MIDDLE, REAR)>

Bombardier has a similer model (after all, they are modelling the same
thing):

<!ELEMENT AIRCRAFT-DOC - - (COCKPIT, STORAGE, TAIL)>

How does inheritance help me to unify these models and validate that
they are actually isomorphic? It doesn't. This is a job for subtyping. I
can also come up with examples where inheritance is more useful without
subtyping but you can always achieve this through other means (which is
why Java does not support it).

Inheritance is a code reuse mechanism, so you can always emulate it with
cut and paste (or, parameter entities, or in a programming language with
delegation). Subtyping is a type system extension. It is completely
different.

I can inherit stuff from my dad without becoming a dad. I can choose to
be a dad without inheriting anything either from my dad, or the "class
dad". They are different things.
 
> * DTDs and schemata *
> 
> Francois Chahuneau's article makes a very effective argument for why we need
> to extend or replace DTD syntax (thanks Robin). XML-Data is a reasonable
> attempt to do so, but it is understandly controversial because it is a such
> a radical departure from the existing syntax. 

I think that XML-Data should be controversial because from my reading it
is just a mix and match combination of interesting features that people
want in schemas without a coherent theory of how they should fit
together. You can't just put 10 smart people into a working group and
have them throw in their good ideas and expect a coherent result.
XML-Data's inheritance mechanism does not take advantage of XML's nature
as a sequence-oriented language for encoding documents. In other words,
it doesn't solve the fundamental problem.

> I quite like the idea of an
> alternate, XML-based schema syntax, but the real lesson of XML-Data is that
> creating an effective inheritance mechanism isn't rocket science. All that
> is really needed is a keyword that says "this element type is derived from
> that element type". Something like:
> 
> <!element dog extends animal...

Sure. This isn't rocket science. But it doesn't solve the fundamental
problem at all. You haven't defined what happens to "BARK" sub-elements
in "DOG". Without that definition, any software dealing with animals
will croak on dogs. Which is exactly what subtyping was supposed to
avoid....
 
> More tricky than any of these technical issues is the question of what, if
> anything, could be done to promote a mechanism of this sort. Obviously this
> would require a change to the XML spec as well as modification to all
> existing tools which process DTDs, so it's a pretty big deal. I wonder if
> anyone besides me thinks that a simple mechanism like this would make sense.
> If so, is there any room in the XML standards process to discuss a change of
> this type at some point in the future (certainly not for XML 1.0)?

Personally, I have yet to see a decent proposal for inheritance and
subtyping in SGML. Coming up with ibe is difficult, which is why I've
spent the last year thinking about it. Dan Connolly has also spent
several years thinking about it. I know that there are many others in
the same boat. I think that we agree that it doesn't make sense to adopt
a solution that solves only 5% of the problem, which is why you will see
resistance to anything like that.

We will know that we have a complete solution to the problem when HTML
6.0 can be described as a subtype of HTML 5.0, and its behaviour in a
"subtype aware" HTML 5.0 browser is predictable and well-defined.
Further, HTML 6.0 must not just extend HTML 5.0 in trivial ways such as
new <HEAD> tags. It must actually have new elements, with new content
models mixed in at all levels. As I said, inheritance-at-the-end solves
about 5% of this problem.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"Journalism is good if you follow the rules. Don't allow the human 
rights groups to spoil your profession" 
    - Col. Godwin Ugbo of the  Nigerian military dictatorship

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

============================================================================

Date: Mon, 20 Apr 1998 09:20:12 -0400
From: Paul Prescod <papresco@technologist.com>
To: xml-dev <xml-dev@ic.ac.uk>
Subject: Inheritance and subtyping in OO languages


I've found a good reference to the 8 year old paper that made the
distinction between inheritance and subtyping most explicit. The paper
itself is not online, but this summary is quite good:

"[CCHO89] and [CoHC90] propose an approach based on explicit interfaces
and interface containment. In this system of object interfaces, one type
is considered a subtype of another if some subset of its interface is
identical to that of the second. [...] Hence in this system class-based
inheritance is strictly a reusability mechanism for sharing behaviour
between objects, not to be confused with subtyping. For example two
classes may be equivalent as types, though neither inherits anything
from the other. So class hierarchies are not the same as type
hierarchies, although they may overlap. Object interfaces [as in Java,
C++, etc. - Paul] clarify this distinction between interface containment
(subtyping) and class- based inheritance and give insight into
limitations caused by equating the notions of type and class in many
typed object-oriented programming languages [such as Simula 67 - Paul]."

  
http://progwww.vub.ac.be/prog/persons/kimmens/research/Introduction-to-OO.html

The paper itself is called: "Inheritance is not subtyping" and is quite
famous, but unfortunately predates the Web.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"Journalism is good if you follow the rules. Don't allow the human 
rights groups to spoil your profession" 
    - Col. Godwin Ugbo of the  Nigerian military dictatorship

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

=======================================================================

From owner-xml-dev@ic.ac.uk Mon Apr 20 10:45:58 1998
Date: Mon, 20 Apr 1998 11:24:15 -0400
From: Paul Prescod <papresco@technologist.com>
Subject: Re: Inheritance in XML


Robin Cover wrote:
> 
> OODB           SGML/XML Markup
> 
> class defn     element declaration
> class name     element type
> object         element
> attribute      attribute
> 
> If we accept this crude analogy, and accept SGML's notion of an
> "attribute" as a name-value pair, then the hope of creating subclasses
> through SGML/XML element declarations appears slim.  

I don't think tha the problem is with SGML/XML element type
declarations. I think that it is with trying to import too literally OO
features. The most important thing about an object is its set of
"methods" or "slots". These define its interface. The most important
thing about an XML element is its content model, or, more generally, the
language it defines (content model+attributes).

But languages and methods are very different. If we made XML's
attributes "richer", we could have attributes that are more like
properties. But the content model problem would remain unless we removed
content models altogether.

OOP works because they figured out a smart way of defining interfaces
(sets of methods) and sub-interfaces (subsets of methods). We must do
the same for languages. 

The problem is easy if we strictly require subtypes to define
sublanguages (i.e. merely restricted content models). That would
occasionally be useful:

<!ELEMENT EMPH (#PCDATA|IMG)>
<!ELEMENT STRONG (#PCDATA|IMG) ISA EMPH>
<!ELEMENT CITE (#PCDATA) ISA EMPH>

But more often we want not just a strict sublanguage, but a language
that can be *transformed into* a sublanguage. For example:

<!ELEMENT FIGURE (CAPTION, OBJECT)>
<!ELEMENT APPLET (CAPTION, JAVACODE) 
        ISA FIGURE( CAPTION=CAPTION, OBJECT=JAVACODE )>

To me, this is much more interesting and useful, but also harder to
figure out, especially when we use the full power of content models.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"Journalism is good if you follow the rules. Don't allow the human 
rights groups to spoil your profession" 
    - Col. Godwin Ugbo of the  Nigerian military dictatorship

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


=======================================================================

Date: Wed, 22 Apr 1998 21:28:46 -0700
From: Jon.Bosak@eng.Sun.COM (Jon Bosak)
Subject: Re: Inheritance in XML
In-reply-to: <01bd6c3a$c50f2910$020b0ac0@xerius> (matthew@praxis.cz)
Sender: owner-xml-dev@ic.ac.uk
To: xml-dev@ic.ac.uk

I'm generally not able to track discussions like this, fascinating
though they may be, and I make it a firm principle not to become
involved in them, so don't expect any further comments from me
regarding this one.  But catching up on my email backlog just now I
see so much good energy being wasted that I can't pass by without
contributing a couple of items of information that may save some
wheel-spinning out there.

First, allow me to vent just a little bit about a common
misunderstanding.

[Matthew Gertner:]

| In last month's Wired, XML made it into the "hype list" with the
| comment that we crazy XML types are kidding ourselves because XML will
| never fly without well-defined semantics.

This gets the "No Shit, Sherlock" award for excellence in trade press
reporting.  XML was very carefully designed to have no built-in
semantics whatsoever.  So considered in isolation, an XML document is
found to have... no semantics!  What an insight!

And we can go further: to give semantics to this thing that was
designed to have no semantics we have to have... it's coming to me,
wait a minute... yes!  We have to supply something else that *does*
provide the semantics!  Wow!  Pulitzer prize time for sure.

Here are some examples of things that can provide semantics for XML
documents:

* Scripts or programs.  Especially Java programs.  :-)

* Prose descriptions (if you said "DTDs" you are confused, but
understandably so; a lot of good people have been confused about this
before you).  The namespace specification provides a standard way to
associate prose descriptions and other bearers of semantic information
with classes of XML documents.

* Stylesheeets.  Especially XSL stylesheets, which are even as we
speak being defined by a very active W3C XSL WG.  This is why you will
want to look carefully at the first XSL working draft expected out in
July, because XSL will provide what is intended to be the most
powerful standardized high-level way to associate presentational
semantics with XML documents in publishing environments.  Watch this
space:

   http://www.w3.org/Style/XSL

So people who think that there is something missing from XML are by
and large simply unaware that it was not intended to be used by itself
and that the other pieces are on their way.  (There's XLink, too.)
This has all been made abundantly clear in every W3C statement about
the XML activity for the last year and a half, but it's to be expected
that a lot of folks just won't bother to pay attention to stuff like
that.

Now let's turn to the chief concern of this thread.  After a number of
excellent observations about the need for a schema language for XML
documents and the considerations that have to go into the
specification of such a thing, Matthew asks the following question:

| More tricky than any of these technical issues is the question of
| what, if anything, could be done to promote a mechanism of this
| sort. Obviously this would require a change to the XML spec as
| well as modification to all existing tools which process DTDs, so
| it's a pretty big deal. I wonder if anyone besides me thinks that
| a simple mechanism like this would make sense.  If so, is there
| any room in the XML standards process to discuss a change of this
| type at some point in the future (certainly not for XML 1.0)?

The answer is, Yes, there are other people who think that it would
make sense to design an XML schema mechanism to handle issues like
what has been called "inheritance" in this discussion (not to mention
good old-fashioned data typing).  The workings of a W3C committee can
be made public only at the discretion of the chair of the committee,
so I will put on my official XML WG Chairman hat and reveal unto ye
that the XML WG has officially requested that the job of defining a
schema language for XML documents be added to its charter.  If
approved by the W3C Director, this work would certainly involve a
consideration of most of the issues raised in this discussion and
would include a close look not only at XML Data but also at other
proposed solutions to the same problem.

Jon

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY