A small collection of [XML-DEV] postings on "inheritance," "subtyping," "subclasses," and related notions in XML/SGML/HyTime. Content last updated April 23, 1998.
Date: Thu, 02 Apr 1998 14:18:49 -0600 To: xml-dev@ic.ac.uk From: "W. Eliot Kimber" <eliot@isogen.com> Subject: Re: "Inheritance considered harmful" [Eliot Kimber] > That's probably because the architecture facility of ISO/IEC 10744 doesn't > *do* inheritance in the way that most people seem to expect. [Paul Prescod] That's right. That's why people get so confused about them. The word inheritance is inherently misleading when applied to architectural forms. Architectural forms do subtyping, not inheritance. Inheritance is about "getting stuff for free" (e.g. code, declarations, fields). Subtyping is about *fulfilling a particular role* (perhaps through a manual construction of an appropriate "interface" (in this case a content model)). Architectural forms allow you to specify an interface that must be fulfilled and declare conformance to that interface. It does not allow you to "get code for free" (i.e. markup declarations). [Eliot Kimber] Paul has made clear... [Steve Newcomb] Paul is absolutely right, but I'm still not going to take his advice. For several months last year, I deliberately stopped using the word "inherit", as in "inheriting architecture", "inherited-from architecture", etc. Instead, I very carefully used the words "derived" for the inheriting architecture and "enabling" for the inherited architecture. This is the vocabulary used in the standard. Ultimately, however, I reluctantly gave up on precision vocabulary because nobody understood what I was talking about, except for people whom I had no need to reach because they already understood the concepts. In almost all rhetorical situations, I have to use vocabulary that may be, strictly speaking, misleading, and yet provides some glimmer of understanding to the HyTime-inexperienced. I'm back to "inherited" and "inheriting", and I never even try to use "enabling" and "derived" any more. I'm open to other suggestions, though. Got any? -Steve ======================================================================= From owner-xml-dev@ic.ac.uk Tue Mar 31 18:55:23 1998 Date: Tue, 31 Mar 1998 18:43:39 -0500 From: "Steven R. Newcomb" <srn@techno.com> Subject: Re: Experimenting with Namespaces - DTDs? David Megginson (ak117@freenet.carleton.ca) writes: > Personally, I'd recommend architectural forms over namespaces if > you're concerned with DTDs, since architectural forms have several > major advantages: David is right. But I would go farther: XML Namespaces are a snare and a delusion. With their use of colon syntax, they lull one into thinking that that are about class inheritance. They are not. Instead, what the namespace thing does is to collapse all the structure of the classes of the inherited-from DTD into a salad of element types which is very correctly termed a namespace rather than an architecture. All that RDF was looking for was a way to guarantee global uniqueness of element type names, and if we ever try to get anything more than that from namespaces, we are on very thin ice indeed. If the inherited-from DTD is already a tag salad, in which all the element types are a big OR group in the content model of the document element, namespaces can work quite well. If, however, an element type has different meanings depending on its context (and most architectures necessarily have this characteristic), then collapsing such an architecture into a namespace can actively interfere with information interchange. I think RDF would benefit substantially, in terms of its understandability, its implementability, and its flexibility, if it were described in terms of inherited architectures. In fact, I think it cries out for an architectural perspective, in which the knowability and significance of element context is preserved. I suspect that RDF's formal rigor would benefit, too, even though its formal rigor is already formidable. (I'm basically impressed by RDF; it's the product of much excellent thinking, I think. I just want MORE!) To be entirely fair and truthful, I must personally accept a share of the blame for this namespace mess; I was present at the first Dublin Core meeting, and, awed by the momentousness of the occasion, I evidently failed to make the case for using architectures for metadata. My later contributions to the W3C XML discussions about namespaces were evidently not persuasive, either. In my own defense, I would argue that this is entirely understandable; it's a subtle issue; nobody has much experience with metadata architectures; what experience there is is dominated by methodologies like MARC that rely on lists of uniquely named fields; and, most of all, the need for even a partial solution to the metadata problem is phenomenally intense. Anyway, all is not lost. This namespace thing is a mistake that will necessarily be corrected, simply in order to support the needs of civilization in an XML-dominated world. The way toward a solution is already paved by an ISO standard (ISO/IEC 10744:1997 Annex A.3) that is being adjusted to accommodate the syntactic limitations of XML (i.e., its lack of #NOTATION attributes). It is implemented in the SP parser and in other software systems, and it is already being used in many industrial contexts. It's the right sort of answer, it's not going away, and its usage is accelerating rapidly; there was a manyfold increase in the number of papers reporting its use at SGML/XML 97. And, anyway, the need for metadata interchange far outstrips RDF's present scope. I hope and believe that many powerful metadata architectures -- including elegant ones that can't be squashed flat and remain useful -- will be multiply inheritable. That way, there can be a marketplace of architectural ideas for metadata in which the full power of context can be exploited. I'd like to see RDF evolve in that direction. -Steve -- Steven R. Newcomb, President, TechnoTeacher, Inc. srn@techno.com http://www.techno.com voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137) fax +1 972 994 0087 (at ISOGEN: +1 214 953 3152) 3615 Tanner Lane Richardson, Texas 75082-2618 USA ======================================================================= From owner-xml-dev@ic.ac.uk Fri Apr 17 05:27:53 1998 Date: Fri, 17 Apr 1998 12:12:03 +0200 From: matthew@praxis.cz (Matthew Gertner) Subject: Inheritance in XML (was Re: Problems parsing XML) To: xml-dev@ic.ac.uk I can't resist jumping in at this point, since it reminds me of some thoughts I had about a topic which was being discussed a couple of weeks ago: inheritance in XML. Unfortunately I seem to have managed to lose the original mails (hint: never remove anything from the mail server if you're not at your primary machine), but the gist was that object-oriented approaches to inheritance are not applicable to XML because XML, unlike OO languages, models only data and not behavior. This led into a very interesting and apt discussion of the difference between inheritance (i.e. of behavior) and subtyping (e.g. of interfaces). To say that OO techniques only apply to behavior is an oversimplication. Some of the basic tenets of OO (encapsulation, polymorphism) are only applicable when behavior is modelled, but I would maintain that others (inheritance, identity) are equally applicable to data. The two last examples would both be of huge benefit to XML and are both currently lacking. Eliot Kimber indicated some scepticism as to whether OO techniques have really lived up to their hype. In terms of a controlled environment, they have. Any C programmer who has moved onto C++ will attest that OO features make it far easier to write extensible and maintainable code. On the other hand, the promise that this would lead to interchangeable components that could be used anywhere has clearly been a flop. Why? For exactly the reason Tim mentioned in his mail: interoperable APIs never work. You can't interface with code and expect this interface to apply to any environment other than the one it was specifically designed for. This is the case whatever technology you are using (DLLs, Java, JavaBeans, Smalltalk, COM, CORBA, etc.). Hence XML. Nevertheless, inheritance of some sort is absolutely vital if XML is to fulfill its promise. If we can't produce standard DTDs which can be extended, *without* modifying the base DTD, then many of the advantages of XML go out the window. This is as important as, say, linking facilities, and is certainly orthogonal to the current namespace proposal. I have been giving quite a lot of thought to how inheritance (I don't really think sub-typing is the right term) could be implemented for XML. I'll have to write up the details in a seperate document, as this mail is getting pretty long. In essence: 1) HyTime provides an extremely valuable and rich basis for this work, just as it has for XML-Link. However, the relevant aspects need to be extracted and presented in a more easily digestible form. Also, HyTime attempts to implement inheritance (of element content) without extending the DTD syntax. This decision should at least be reevaluated in the context of XML. 2) OO languages provide extensive facilities for inheritance of data members (quite independently of methods), and these concepts would also be very valuable in this context. 3) Additional thought must be given to adapting the content model of existing element types in a base DTD without having to write out a whole new content model. This is pretty scary, but I imagine it would be possible to define primitives saying things like: a) certain new element types can be inserted in front of the existing content model. b) certain new element types can be appended at the end of the existing content model. c) certain new element types can be inserted at a given location in the existing content model. d) etc. I'd be really interested in reading others thoughts on this matter. Cheers, Matthew -----Original Message----- From: Tim Bray <tbray@textuality.com> To: xml-dev@ic.ac.uk <xml-dev@ic.ac.uk> Date: Friday, April 17, 1998 6:07 AM Subject: Re: Problems parsing XML >At 10:35 PM 14/04/98 -0500, len bullard wrote: >>> [Chris Maden <crism@ora.com>:] >>> > One fundamental flaw in _XML Complete_ is Holzner's apparent belief >>> > that you must write Java code in order to do anything useful with >>> > XML. > >>Markup doesn't care. That's the beauty of it. :- > >Yes! What he said. As a result of having been a programmer since >A.D. 1979, my faith in interoperable APIs is torn and shredded. >But I think that interoperable syntax is usefully achievable. >Hence, XML. -T. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Fri Apr 17 05:55:32 1998 Date: Fri, 17 Apr 1998 11:46:40 +0100 From: Michael Kay <M.H.Kay@eng.icl.co.uk> Subject: Re: Inheritance in XML (was Re: Problems parsing XML) Sender: owner-xml-dev@ic.ac.uk To: xml-dev@ic.ac.uk Matthew Gertner: >Some of the basic tenets of OO (encapsulation, polymorphism) are only >applicable when behavior is modelled, but I would maintain that others >(inheritance, identity) are equally applicable to data. The two last >examples would both be of huge benefit to XML and are both currently >lacking. I agree absolutely. I have found identity and subtyping to be the two biggest benefits in using an object database over a relational database. >Nevertheless, inheritance of some sort is absolutely vital if XML is to >fulfill its promise. If we can't produce standard DTDs which can be >extended, *without* modifying the base DTD, then many of the advantages of >XML go out the window. I agree that this is central. Let's leave identity out of the discussion, as that does, I think, fall into the XML Linking domain, and concentrate on what I prefer to call subtyping. There's a lot of stuff in the SGML culture that one could fall back on: architectural forms etc, but I for one find it extremely arcane and difficult to relate to my own domain of object modelling and database design, which I think is familiar to a much wider community. I know some people will disagree, but the way I use XML, a DTD is a schema, an element definition in a DTD is a class, a document is a database, and an element within a document is an instance of a class. What is missing is that we can't define one class (element type) as a subtype of another. Since we are only concerned with structural subtyping and not with behaviour, I don't think it would actually be difficult to define this concept. The main thing that's tricky is that you can get the "is-a" the wrong way round. If a PREFACE is-a-kind-of CHAPTER, that means you can find anything (elements, attributes) in a PREFACE that you can find in a chapter, and more besides. It also means you can reduce a PREFACE to a CHAPTER by removing these extra bits. I'm not entirely sure what "removing the extra bits" means: for example should it remove elements that cannot occur in a CHAPTER, or should it just remove the tags that surround those elements? This tends to show up the lack of semantics in the object model underlying XML. Just some thoughts... Mike Kay, ICL xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Fri Apr 17 10:31:59 1998 Date: Fri, 17 Apr 1998 10:18:30 -0500 To: <xml-dev@ic.ac.uk> From: "W. Eliot Kimber" <eliot@isogen.com> Subject: Re: Inheritance in XML (was Re: Problems parsing XML) At 12:12 PM 4/17/98 +0200, Matthew Gertner wrote: >1) HyTime provides an extremely valuable and rich basis for this work, just >as it has for XML-Link. However, the relevant aspects need to be extracted >and presented in a more easily digestible form. Also, HyTime attempts to >implement inheritance (of element content) without extending the DTD syntax. >This decision should at least be reevaluated in the context of XML. I appreciate the vote of confidence for architectures and hesitate to make the next comment. However, there appears to be a general misconception about architectures that I feel I must attempt to correct, to wit, that architectures have ANYTHING to do with inheritance. Mathew says "HyTime attempts to implement inheritance (of element content) without extending the DTD syntax". This is a false statement because HyTime DOES NOT ATTEMPT to define any form of inheritance as I understand that word. Therefore, it is not a failing of the AFDR that it did not extend DTD syntax (which was never a realistic option at the time it was designed). The decision that was made was the only possible decision at the time. This is not to say that I object to the idea of true inheritance in SGML. I do not. It would almost certainly be a useful facility, making the use of architectures at least easier, if not more powerful as well. So I appreciate the depth of thought that is being and will be put into this issue. I simply object to the suggestion that there is anything wrong with architectures as they stand because they fail to provide a proper or useful inheritance mechanism. Architectures cannot fail at something they explicitly don't try to do. I don't want people to think that they shouldn't use architectures because they don't do inheritance. Architectures aren't about inheritance--they are orthoganal but synergistic concepts. The *processing effect* of using architectures may *appear* to be inheritance, but that is a side effect of the type of processing that architectures enable, not a direct intent of the architectural mechanism. Or, said another way, architectures were designed to enable object-oriented *processing* but not object-oriented construction of instance DTDs for enabling parsing and validation. The latter simply isn't a requirement for the former and is orders of magnitude harder to invent, specify, and implement. Remember: DTDs exist for exactly two reasons: 1. To enable *syntactic* validation of instances 2. To enable the use of markup minization features For all other types of processing DTDs are *irrelevant*. Thus, you do not need to think about DTDs at all in order to enable object-oriented *processing*, which is one of the things architectures do. Architectures also enable the syntactic validation of documents against the architectural syntax rules (the architectural DTD), but they do not need to provide an "DTD inheritance" mechanism in order to do that--they simply need to enable the automatic generation of new instances that conform to the architectural DTD. This is a pretty trivial thing to define and implement (modulo the optional automapping facility, which, like any markup minimization feature, complicates things a bit). It might help to understand why architectures are designed the way they are. Architectures are designed to give you a way to define a set of general rules for processing documents for some specific purpose (e.g., hyperlinking, defining metadata, etc.). Document instances use these rules by reference by asserting derivation from the architecture and conformance to its rules. Because SGML can only talk about syntactic rules and because the architecture mechanism uses SGML syntax as the base definition of its rules, these sets of rules provide an ability to define syntactic constraints in way that is similar or identical to those provided by a document's private DOCTYPE declaration. At the same time, these rules do not impose any requirements on the names used in instances, because avoidance of name-space incursion is a basic principle of SGML and its related standards. Thus, a general set of rules define a set of types that instances assert conformance to, rather than defining the instance types directly. Note that architectures presume additional definitions beyond the architectural DTD but cannot, of course, define how these rules might be specified (because there are an infinite number of useful ways to do so). Note that the direction of pointing is from instances to types to establish an is-a or kind-of relationship. This is merely an *assertion* made by element *instances* (not types). This means that there is no, I repeat, no connection between element type declarations and architectural types ("forms"), except that the markup minization feature of fixed attributes lets you fix the mapping for instances at part of an element type declaration. But it is not meaningful to say that an element type conforms to an architectural form--only instances can conform. This further suggests that what architectures do is not inheritance because instances do not inherit properties from other types, they are simply instances of types. Architectures do not define any notion of types being derived from types. [The derivation of one architecture from another is really the derivation of architectural *instances* from another architecture, not derivation of the architecture. This truth is obscured by the fact that architectural instances are normally only transient objects used by processors and not literally instantiated as SGML documents.] In addition, the rules defined by an architecture need not cover the entirety of an instance. The HyTime architecture, for example, only covers those parts of documents involved with linking and addressing. Therefore, the mechanism must be flexibile enough to allow both different elements of diffent types in the same document to be derived from different architectures and a single element to be derived from different architectures at once. Because each architecture defines a distinct "processing context", there is no problem in having a single element derived from multiple architectures because the processing for each architecture is independent of the processing for any other architecture. There is no "multiple inheritance" problem because it's not inheritance. It's no different from me saying that I conform to the rules for both male humans and licensed drivers. These are distinct rule domains and as long as the rules for conformance to both do not result in a conflict such that I can't satisfy both at once, there are no problems. [For example, I could also say that I can conform to the rules for licensed drivers and medical cadavers but I obviously can't do both at the same time, because being a cadaver includes a requirement that makes it impossible for me to conform to the rules for licensed drivers.] Note that the assertion made by elements that they conform to a given form is NOT saying "instance element X inherits the *syntactic* properties of architectural form Y". It is saying "instance element X *conforms to* the syntax and semantics of architectural form Y". It is an assertion of conformance or derivation that does not have any implications about the content model of the instance except that it must *allow* (but not necessarily require) instances that conform to the architectural content rules. The only constraints architectural content models impose on instances is the requirement for *potential* conformance. But instances are free to allow content that would not conform, because not all instances will be processed or validated with respect to a given architecture. [There may, however, be a definite processing result that looks or in fact is inheritance, but that's inheritance of processing, which is different from inheritance of local syntactic rules. Object-oriented techniques are a natural way to implement processing because you can reflect the *taxonomic* hierarchy represented by an architecture with programmatic objects.] For example, say I define an architecture for sections in technical manuals with the following architectural content model: <!-- Section architecture: --> <!ELEMENT Section (Title, (Para+ | Section+)) > <!-- Another form that is not allowed within Section --> <!ELEMENT Intro (Para+) > <!-- End of architecture --> In a document, I might have this element type, instances of which can be derived from the Section form: <?XML version="1.0" ?> <!DOCTYPE Division [ <?IS10744:arch name="Sectarch" ... ?> <!ELEMENT Division ((Title | Metadata), (Para+ | (Intro, Division+) | Division+)) > <!ATTLIST Division sectarch (Section) #IMPLIED > ]> <Division sectarch="Section"> <!-- This Division claims conformance but fails to conform because the Section architectural element does not all the Intro architectural element in its content. --> <Metadata>...</Metadata> <Intro> .. </Intro> <Division> ... </Division> </Division> This document is valid with respect to its own rules. It should be clear from inspection that it allows instances that conform to the Section architecture. It also allows instances that do not conform. It should also be clear that the instance does not conform to the Section architecture (even though it asserts conformance by asserting derivation from the Section form). Thus, given an architectural element type, there is no way to predict the content models of conforming instances except to say "it will probably *allow* conforming instances*. Note that given an architectural element type, it is probably easy to *generate* instance content models that will ensure conformance (e.g., just copy the architectural declarations into the instance and change the names, if desired), but combining two or more forms from different architectures into a single element type probably cannot be done programmatically in any satisfactory way because too many arbitrary decisions will have to be made, possibly based on variables that can only be understood or provided by humans (such as when are instances expected to be validated against a particular architectural derivation). It should be clear that any notion of true inheritance of content models from architectures to instances is problematic at best, provably impossible at worst. In addition, it would require that the instance parser have access to all architectural DTDs and be able to synthesize them according to some set of combinatorial heuristics. To my mind, this is a level of processing overhead that is unacceptably high if all conforming parsers must support it. In particular, it seems to be directly at odds with at least one of XML's basic principles (actually, I can think of at least three: enabling small parsers, no options, simplicity of specification). By constrast, you only need to access and use an architectural DTD when you are *validating* with respect to that architecture, which is always an option. Validation is not a requirement for doing architecture-aware processing. A processor for any given architecture presumably has built-in knowledge of the forms in that architecture. In any case, DTD's only enable validation and parsing, not processing, so they are largely irrelevant to the issue of enabling *processing*, which is the primary purpose of architectures. Thus, the use of architectures imposes *no requirements* on instance parsers to do anything more than they have to do today. Validating with respect to an architecture is a choice that users of documents get to make. But, doing such combination in some non-SGML schema syntax is perfectly reasonable to contemplate because at that point you've gone outside the minimum requirements of SGML parsing and by definition there is no requirement that any conforming instance parser do any processing with respect to non-SGML-syntax schemas. Cheers, Eliot -- <Address HyTime=bibloc> W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com </Address> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Fri Apr 17 10:56:11 1998 Date: Fri, 17 Apr 1998 11:52:15 -0400 From: Frank Manola <fmanola@objs.com> Subject: Re: Inheritance in XML (was Re: Problems parsing XML) At 8:55 PM -0700 4/16/98, Tim Bray wrote: >At 10:35 PM 14/04/98 -0500, len bullard wrote: >>> [Chris Maden <crism@ora.com>:] >>> > One fundamental flaw in _XML Complete_ is Holzner's apparent belief >>> > that you must write Java code in order to do anything useful with >>> > XML. > >>Markup doesn't care. That's the beauty of it. :- > >Yes! What he said. As a result of having been a programmer since >A.D. 1979, my faith in interoperable APIs is torn and shredded. >But I think that interoperable syntax is usefully achievable. >Hence, XML. -T. > and Matthew Gertner wrote: >Eliot Kimber indicated some scepticism as to whether OO techniques have >really lived up to their hype. In terms of a controlled environment, they >have. Any C programmer who has moved onto C++ will attest that OO features >make it far easier to write extensible and maintainable code. On the other >hand, the promise that this would lead to interchangeable components that >could be used anywhere has clearly been a flop. Why? For exactly the reason >Tim mentioned in his mail: interoperable APIs never work. You can't >interface with code and expect this interface to apply to any environment >other than the one it was specifically designed for. This is the case >whatever technology you are using (DLLs, Java, JavaBeans, Smalltalk, COM, >CORBA, etc.). Hence XML. These observations about the (at least so far) lack of success with truly interoperable APIs are certainly true, and the potential of interoperable syntax "feels" right, but I wonder to what extent we may be comparing apples and oranges here. Specifically, what do we mean by "interoperable"? Interoperable APIs are hard at least in part because an incredible amount of semantics are (implicitly) built into a typical API (as is suggested by Matthew's comment). Moreover, interoperable APIs are held to a "strict accountability": the programs interacting through them must work without either syntactic or semantic errors (and, with programs, these are typically all bundled up). However, if programs must agree on the precise meanings of tagged data in order to guarantee proper operation when exchanging data (and what else does a fair understanding of "interoperable" mean in this context?), won't the semantics that must be mutually understood be (approximately) just as complex? And don't we then need to consider the mechanism(s) for achieving *that* in our comparisons? After all, it's not enough that the programs be "interoperable" in the sense that they can each "operate" (e.g., read, parse, or even approximately get the meaing) on the other's data; the operation must also be "correct" in a fairly constrained sense. I have in mind all the problems large companies are having merging data from different databases into data warehouses due to sometimes subtle differences in semantics (e.g,, of what a "customer" is), even when the data item names (corresponding to markup) are the same (or, at least, fairly regular). I'm not, here, arguing *against* the idea of interoperable syntax, but I am questioning how easy it will really be to get the degree of "interoperability" we seem to be implicitly expecting. --Frank ----------------------------------------------------------------------- Frank Manola www: http://www.objs.com Object Services and Consulting, Inc. email: fmanola@objs.com 151 Tremont Street #22R voice: 617 426 9287 Boston, MA 02111 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= Date: Fri, 17 Apr 1998 13:23:37 -0400 From: Paul Prescod <papresco@technologist.com> Subject: Re: Inheritance in XML (was Re: Problems parsing XML) Sender: owner-xml-dev@ic.ac.uk To: xml-dev@ic.ac.uk Matthew Gertner: >Nevertheless, inheritance of some sort is absolutely vital if XML is to >fulfill its promise. If we can't produce standard DTDs which can be >extended, *without* modifying the base DTD, then many of the advantages of >XML go out the window. Michael Kay wrote: > > I agree that this is central. Let's leave identity out of the discussion, as > that > does, I think, fall into the XML Linking domain, and concentrate on what I > prefer to call subtyping. You act as if this is just a terminological difference, but it isn't. He is talking about one thing and you are talking about another. He speaks of "Producing standard DTDs which can be extended *without* modifying the base DTD" is inheritance. It can be implemented right now through parameter entity hacks and is not subtyping. You on the other hand seem to be talking about subtyping: > I know some people will disagree, but the way I use XML, a DTD is a > schema, an element definition in a DTD is a class, a document is a > database, and an element within a document is an instance of a class. > What is missing is that we can't define one class (element type) as a > subtype of another. The only reason that the concepts *even intersect* is because a) subtyping without inheritance is often painful and leads to code duplication. I claim that architectural forms and Java "interfaces" are often painful for exactly this reason. Of course in [SG|X]ML, inheritance can be hacked with parameter entities, which is something HyTime does for its architectures. (also HyTime can only be thought of as subtyping if you use it in a restricted form...) b) inheritance without subtying is only occasionally useful. I can't remember the last time I used "private inheritance" in C++ and I don't even remember right now if Java supports it. But the fact that the two concepts work well together does not make them synonyms. They are not. > The main thing that's tricky is that you can get the "is-a" the wrong way > round. If a PREFACE is-a-kind-of CHAPTER, that means you can find > anything (elements, attributes) in a PREFACE that you can find in a chapter, > and more besides. No it doesn't. If PREFACE is-a-kind-of CHAPTER then source code designed to handle chapters should work with prefaces. That means that PREFACE must either directly describe a *subset* of the language described by CHAPTER (i.e. have a constrained content model) or PREFACE must provide "some mechanism" for transforming its content into a language understandable by CHAPTERs. In real world documents, we often want to be able to have subtypes that are also extensions, which means that we need to define some transformational system (as archforms do). This transformational question is exactly what makes subtyping with extension very tricky. Subtyping without extension is trivial. This is why I have stepped back from the question of subtyping with extension and am investigating transformation languages. In particular I am right now looking at Forest Automata theory and a transformation language designed by Makato Murata. > It also means you can reduce a PREFACE to a CHAPTER > by removing these extra bits. I'm not entirely sure what "removing the extra > bits" means: for example should it remove elements that cannot occur > in a CHAPTER, or should it just remove the tags that surround those > elements? This tends to show up the lack of semantics in the object > model underlying XML. That's exactly right. Your confusion is my confusion. The only way out is through transformation languages -- either simple, relatively weak ones like those provided by archtiectural forms, or more powerful (and more complicated? I don't know yet?) ones like those described by Murata-san in his various Principles of Documentation papers. They are at: http://www.geocities.com/ResearchTriangle/Lab/6259/ Unless you are much smarter than me, you will probably not find these light reading, but my hope is that the concepts can be simply expressed in a nice syntax in much the same way that regular expressions hide the nastiness of DFAs. There is in fact such a thing as a regular tree expression that is quite analogous to a regular expression. I don't yet know if these can be hooked up to an easy to use (non-programmable!) transformation language yet. Sorry for the brain dump. I'm late for a meeting. Paul Prescod - http://itrc.uwaterloo.ca/~papresco [Woody Allen on Hollywood in "Annie Hall"] Annie: "It's so clean down here." Woody: "That's because they don't throw their garbage away. They make it into television shows." xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ========================================================================== From: "Martin Bryan" <mtbryan@sgml.u-net.com> To: <xml-dev@ic.ac.uk> Subject: Re: Inheritance in XML (was Re: Problems parsing XML) Date: Sat, 18 Apr 1998 08:49:07 +0100 Message-ID: <01bd6a9e$92ae10a0$2b8577c2@sgml.u-net.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 4.71.1712.3 X-MimeOLE: Produced By Microsoft MimeOLE V4.71.1712.3 Sender: owner-xml-dev@ic.ac.uk Precedence: bulk Reply-To: "Martin Bryan" <mtbryan@sgml.u-net.com> Status: R Michael Kay wrote: >I know some people will disagree, but the way I use XML, a DTD is a >schema, an element definition in a DTD is a class, a document is a >database, and an element within a document is an instance of a class. >What is missing is that we can't define one class (element type) as a >subtype of another. In SGML you can use exclusions to make an element a true subclass of another: <!ELEMENT X (%Y-contents;) -(a|b|c)> providing a, b and c are optional components within the model for Y. Unfortunately XML dropped this useful option from the set of SGML facilities it in inherited Martin Bryan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ========================================================================== Date: Sat, 18 Apr 1998 10:31:48 -0400 From: Paul Prescod <papresco@technologist.com> X-Mailer: Mozilla 4.04 [en] (WinNT; U) MIME-Version: 1.0 To: xml-dev@ic.ac.uk Subject: Re: Inheritance in XML (was Re: Problems parsing XML) References: <01bd6a9e$92ae10a0$2b8577c2@sgml.u-net.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-xml-dev@ic.ac.uk Precedence: bulk Reply-To: Paul Prescod <papresco@technologist.com> Status: R Martin Bryan wrote: > > > In SGML you can use exclusions to make an element a true subclass of > another: > > <!ELEMENT X (%Y-contents;) -(a|b|c)> > > providing a, b and c are optional components within the model for Y. Element X is not a true subclass or subtype. Given a content model: <!ELEMENT J (Y)> You cannot use an X. What you've done above is make an element whose content model is more restrictive than some other content model. You can also do that without exclusions. I don't think I've ever used exclusions in that way. One big problem is that the exclusion doesn't just change the content model, but the content model of all of X's children. You don't want that if all you need is content model subsetting. Paul Prescod - http://itrc.uwaterloo.ca/~papresco "Journalism is good if you follow the rules. Don't allow the human rights groups to spoil your profession" - Col. Godwin Ugbo of the Nigerian military dictatorship xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ========================================================================== From owner-xml-dev@ic.ac.uk Sat Apr 18 12:21:12 1998 Received: from bowmore.cc.ic.ac.uk (bowmore.cc.ic.ac.uk [155.198.5.22]) by ACADCOMP.SIL.ORG (8.8.5/SIL-1.0) with SMTP id MAA24756 for <robin@acadcomp.sil.org>; Sat, 18 Apr 1998 12:21:10 -0500 (CDT) Received: from majordom by bowmore.cc.ic.ac.uk with smtp (Exim 1.58 #2) id 0yQb9p-00009I-00; Sat, 18 Apr 1998 18:12:05 +0100 Received: by ic.ac.uk (bulk_mailer for ic.ac.uk v1.7); Sat, 18 Apr 1998 18:12:01 +0100 Received: from majordom by bowmore.cc.ic.ac.uk with local (Exim 1.58 #2) id 0yQb9d-00008e-00; Sat, 18 Apr 1998 18:11:53 +0100 Received: from punch.ic.ac.uk [155.198.5.17] by bowmore.cc.ic.ac.uk with smtp (Exim 1.58 #2) id 0yQb9Q-00008A-00; Sat, 18 Apr 1998 18:11:40 +0100 Received: from ACADCOMP.SIL.ORG [208.145.80.4] by punch.ic.ac.uk with smtp (Exim 1.62 #1) id 0yQb9P-0001h3-00; Sat, 18 Apr 1998 18:11:39 +0100 Received: (robin@localhost) by ACADCOMP.SIL.ORG (8.8.5/SIL-1.0) id MAA24752 for xml-dev@ic.ac.uk; Sat, 18 Apr 1998 12:18:44 -0500 (CDT) Date: Sat, 18 Apr 1998 12:18:44 -0500 (CDT) From: Robin Cover <robin@acadcomp.sil.org> Message-Id: <199804181718.MAA24752@ACADCOMP.SIL.ORG> To: xml-dev@ic.ac.uk Subject: Re: Inheritance in XML Sender: owner-xml-dev@ic.ac.uk Precedence: bulk Reply-To: Robin Cover <robin@acadcomp.sil.org> Status: R > Re: Subject: Re: Inheritance in XML (was Re: Problems parsing XML) > Date: Sat, 18 Apr 1998 08:49:07 +0100 > Reply-To: "Martin Bryan" <mtbryan@sgml.u-net.com> >>What is missing is that we can't define one class (element type) as a >>subtype of another. > In SGML you can use exclusions to make an element a true subclass of > another: > > <!ELEMENT X (%Y-contents;) -(a|b|c)> > > providing a, b and c are optional components within the model for Y. > Unfortunately XML dropped this useful option from the set of SGML facilities > it in inherited > > Martin Bryan Martin, I wish I could believe this were true and useful. It seems that we confront here one of the several troublesome mismatches between OO database modeling and SGML/XML markup, with respect to the simple analogy: OODB SGML/XML Markup class defn element declaration class name element type object element attribute attribute If we accept this crude analogy, and accept SGML's notion of an "attribute" as a name-value pair, then the hope of creating subclasses through SGML/XML element declarations appears slim. Appears "to me" I should say: I would welcome comments from the experts. For starters, subclassing normally would mean further specialization by the addition (possibly 'plus subtraction') of properties, viz., of attributes. Formally, then, an SGML element declaration can't do the work: it would need to be an ATTLIST declaration. But then we face the problem that you can't model a complex attribute with the SGML 'attribute' anyway (if you want any validation): the "value" in '(name-)value' is a flat/string in SGML, at least in the literal sense. Of course, one can (and we all do) model "real" attributes using SGML elements -- since we have no realistic alternative -- but that creates other problems for the notion of using SGML element decls as a subclassing mechanism. One such problem is that (real) attributes are unordered. The straightforward way to model an object/element with (some optional, some required) attributes a, b, c, d, e, and f would seem to be: (a* & b? & c? & d & e?), but SGML/XML notions of prescribing order in the serialization are fairly strong, and XML won't even allow the use of the AND connector to indicate what I plainly mean in this sample assertion. (Perhaps Steph Tryphonas has written a program by now to convert all content models using AND to use only OR, without sacrificing any integrity constraints on occurrence and sequence). In any case, the impulse toward serialization in SGML -- at least in practice, given tools that force end users to reckon with (arbitrary non-intuitive) "order" based upon sequence rules in content models -- tends to work against the easy use of SGML elements to model attributes. Even apart from these mismatches between "object" modelling and SGML/XML encoding, I question whether " <!ELEMENT X (%Y-contents;) -(a|b|c)> " creates a useful "true subclass." Why would one want to create a subclass based upon the subtraction of optional "attributes" (subelements)? I think that would make it a superclass in many OO systems. In this connection, one might be inclined to argue that the treatment of "content" as a special attribute is unfortunate, at least from the perspective of data modelling, where "part-whole" has no quintessential role vis-a-vis "is-a" or "has-a" or "kind-of" or "points-to"... At which point, others would quickly point out that they think it's specious to be talking about object modeling in terms of SGML-based markup languages anyway, since "these languages can neither formally express nor enforce semantic integrity constraints which are so critical to good object modelling..." I think this all leads me in the direction of favoring the efforts at defining other schema languages (beyond SGML/XML DTD syntax), granting that the validation of instances against their schemas, if/when critical, will need to be done outside the framework of the SGML/XML "parser/processor" as defined. I have little doubt that someone as brilliant as Eliot can show how the desired objectives might be met through architecture processing by an appropriate architecture engine; I don't know whether this is the "best" path in all cases, or whether SGML/XML users will want to deal with all the layers of indirection that architectures seem to want. I hope that experts with some years of experience in OO systems will contribute their insights to the new "schema" projects. -rcc ------------------------------------------------------------------------- Robin Cover Email: robin@acadcomp.sil.org 6634 Sarah Drive Dallas, TX 75236 USA >>> The SGML/XML Web Page <<< Tel: +1 (972) 296-1783 (h) http://www.sil.org/sgml/sgml.html Tel: +1 (972) 708-7346 (w) FAX: +1 (972) 708-7380 ========================================================================= From owner-xml-dev@ic.ac.uk Mon Apr 20 04:30:50 1998 From: matthew@praxis.cz (Matthew Gertner) Subject: Re: Inheritance in XML Reply-to: matthew@praxis.cz (Matthew Gertner) Robin, You really hit the nail on the head with this post! These are exactly the kinds of issues that I was having some trouble expressing in my previous mail. I have read this thread with great interest, and it seems to me that if we synthesize the discussion we are getting close to the heart of the matter. Here is my attempt: * Terminology * I personally don't agree that there are carved-in-stone, well-understood definitions for terms like "inheritance" and "subtyping" in XML. While there surely are in certain, specific contexts, we are talking about something new, i.e. inheritance in XML, and what we really need to do is chose a term and define it precisely. Does HyTime model inheritance? It does if my definition of inheritance in XML corresponds to what HyTime does (it doesn't: see below). Is "subtyping" a better term. No, because it doesn't have the same resonance as the word "inheritance" among non-programmer types. I'll make a first attempt: "Inheritance in XML refers to the process of creating new element types that duplicate the content model and attribute list of existing element types (in the same or a seperate "base" DTD), while extending these to include additional attributes and/or content. As such, instances of the new element types can be used wherever the base element type can be used, and can be processed polymorphically by any external processor which knows about the base element type." * HyTime * I read through Eliot's post and understood some of it. :-) I never meant to question any design decisions made in the specification of HyTime. They are all well-justified in the context which prevailed at the time. Despite the fact that HyTime models derivation (I'll stay away from the i-word in light of the definition given above) of instances and not of schemata, it remains one of the few attempts that have been made at deriving document types and as such is an extremely valuable basis for the thinking about a true inheritance mechanism for XML. To meet the definition I proposed above, this mechanism would have to extend the DTD syntax or create a new one (see below). The goals and uses of HyTime derivation are and will continue to be somewhat different from this; I was only trying to point out that we can benefit greatly from the experience gained from HyTime in thinking about XML inheritance. * Semantics and XML * In last month's Wired, XML made it into the "hype list" with the comment that we crazy XML types are kidding ourselves because XML will never fly without well-defined semantics. These sentiments were echoed by several posts on this list. I agree 100% percent, but as several people pointed out, there are already a lot of semantics associated with XML, to the extent that there are semantics associated with the idea of a hierarchy and with the HAS-A relationship. XML-Link and XSL introduce a very valuable additional set of semantic relationships. We are all so excited about XML, as opposed to Excel files, Postscript or what have you, because there are tools like XML parsers, editors and browsers which have value across the whole range of XML applications. I can write an XML file, and to the extent that existing semantics are sufficient, I can do useful work with this file. I can, for example, display it as a hierarchy. I can't do anything at all with an Excel file unless I have Excel. This doesn't eliminate the need to define the specific semantics of a given schema. This can only be done with clear documentation, as Paul pointed out. What we can do is capture the semantics expressed in this documentation and use them as the basis for new schemata. Sure, a lot of this can be done using "parameter-entity hacks", or by writing content models out by hand, but this isn't going to be an effective way to bring XML to the masses. The whole discussion about XML semantics is very apt in this context precisely because inheritance is so important for making XML really useful. Let me give an example implied by Peter (in reference to the agglutination of DTDs for nuclear power plant software). Let's say that I am developing an advanced medical diagnosis system based on chemical analysis of blood samples. Part of the application is a hardware device which looks for specific molecules in the sample and displays them on a monitor in 3D. I decide to use CML to model these molecules, but I need to add additional attributes and content to the molecule description which are specific to my application. With the kind of inheritance mechanism I am talking about, I could download a CML viewer and use it "out of the box" to display the molecules, while still passing the entire XML structure (with my additional information) to the application with attempts to create a diagnosis. Without XML inheritance, I will probably "break" the viewer, so I find myself wading through and adapting a lot of Java code. At this point I start wondering why I decided to use XML in the first place... * DTDs and schemata * Francois Chahuneau's article makes a very effective argument for why we need to extend or replace DTD syntax (thanks Robin). XML-Data is a reasonable attempt to do so, but it is understandly controversial because it is a such a radical departure from the existing syntax. I quite like the idea of an alternate, XML-based schema syntax, but the real lesson of XML-Data is that creating an effective inheritance mechanism isn't rocket science. All that is really needed is a keyword that says "this element type is derived from that element type". Something like: <!element dog extends animal... ...where the subsequent content model and attribute list are understood as being extensions to those of the base element type. The only other issue is whether more complex handling of the context model is needed. * Content model * XML-Data (if I understand correctly) simply tacks any new content for a derived element type at the end of the base content model. A valid question, addressed briefly in my previous post, would be whether more robustness is needed in modifying the existing content model. Steve and Robin both mentioned this aspect as well; one of the most powerful features of SGML/XML, as compared with OO languages, is the fact that content is ordered. It would be nice, therefore, to take this into account in any putative inheritance mechanism. Things like SGML exclusions don't fit the above-mentioned definition of inheritance, for the reasons mentioned by Robin (and others) in his post. Having given this some more thought, I don't see any practical way to insert new content in the middle of an existing content model. Maybe someone cleverer than I has an idea about how this might be done (and whether it is really useful). In the meantime, one useful approach might be to at least enable new content to be added at the beginning of the base content model by adding a #BASECONTENT keyword which is replaced by the base content model in the derived element type description: <!element dog extends animal (breed,#BASECONTENT,fleas*)> This would simply mean that the breed element precedes the content of the base element type, which is then followed optionally by some flea elements. This approach is probably sufficient, since other modifications to the base content model could be taken into account in the design phase of the base schema (i.e. by breaking up monolithic elements, if necessary). * What now? * More tricky than any of these technical issues is the question of what, if anything, could be done to promote a mechanism of this sort. Obviously this would require a change to the XML spec as well as modification to all existing tools which process DTDs, so it's a pretty big deal. I wonder if anyone besides me thinks that a simple mechanism like this would make sense. If so, is there any room in the XML standards process to discuss a change of this type at some point in the future (certainly not for XML 1.0)? Cheers, Matthew -----Original Message----- [deleted] ========================================================================= From owner-xml-dev@ic.ac.uk Mon Apr 20 08:19:24 1998 Date: Mon, 20 Apr 1998 09:04:57 -0400 From: Paul Prescod <papresco@technologist.com> Subject: Re: Inheritance in XML Matthew Gertner wrote: > > * Terminology * > > I personally don't agree that there are carved-in-stone, well-understood > definitions for terms like "inheritance" and "subtyping" in XML. I don't think that anyone claimed that there is a well-understood definition for "inheritance" in any context -- even OO. But to be consistent with English, it must have something to do with "getting something for free." In the XML context the most obvious thing would be declarations. Subtyping is different. Subtyping comes straight from mathematics and is as old as logic (at least). A type defines a set of objects. A subtype describes a subset of those objects. Simple and precise. > Is > "subtyping" a better term. No, because it doesn't have the same resonance as > the word "inheritance" among non-programmer types. I don't know why you think that. Non-programmer types are likely to balk at either word, but at least subtyping is shorter, and can be precisely defined. Anyhow, it is not at all like the words are interchangable. You can't pick and choose from words that already have meanings. > I'll make a first attempt: > "Inheritance in XML refers to the process of creating new element types that > duplicate the content model and attribute list of existing element types (in > the same or a seperate "base" DTD), while extending these to include > additional attributes and/or content. As such, instances of the new element > types can be used wherever the base element type can be used, and can be > processed polymorphically by any external processor which knows about the > base element type." ACK! This definition was proven inadequate in the OO software world around a decade ago. Both C++ and Java allow subtyping without inheritance, and C++, Sather and Eiffel allow inheritance without subtyping (I suppose to get that in Java, you would have to use delegation). If we are going to borrow ideas from OO, then we should at least use the updated, modern ideas, not those that were accidently confused in Simula 67 (and have been confused in programmers minds ever since). The first major problem with your definition actually has nothing to do with the inheritance/subtyping conundrum. The biggest problem is that if you "extend" a content model, you are making a more flexible language, which *cannot* be processed polymorphically by an external processor which knows nothing about the base element type: <!ELEMENT TITLE (#PCDATA)> <!ELEMENT MY-TITLE (#PCDATA|IMG|FOO|BAR)> Now imagine software that generates a TOC from titles, presuming them to be strictly textual. What does it do with images in titles? Now let's talk about inheritance and subtyping. This is not a merely theoretical issue. It has important practical implications. The most interesting, important application of subtyping is allowing divergent evolution of compatible schemas. This is why architectural forms were invented. But for this to work, subtyping *must* be unhitched from inheritance. Suppose that Boeing has a content model: <!ELEMENT AIRPLANE-DOC - - (FRONT, MIDDLE, REAR)> Bombardier has a similer model (after all, they are modelling the same thing): <!ELEMENT AIRCRAFT-DOC - - (COCKPIT, STORAGE, TAIL)> How does inheritance help me to unify these models and validate that they are actually isomorphic? It doesn't. This is a job for subtyping. I can also come up with examples where inheritance is more useful without subtyping but you can always achieve this through other means (which is why Java does not support it). Inheritance is a code reuse mechanism, so you can always emulate it with cut and paste (or, parameter entities, or in a programming language with delegation). Subtyping is a type system extension. It is completely different. I can inherit stuff from my dad without becoming a dad. I can choose to be a dad without inheriting anything either from my dad, or the "class dad". They are different things. > * DTDs and schemata * > > Francois Chahuneau's article makes a very effective argument for why we need > to extend or replace DTD syntax (thanks Robin). XML-Data is a reasonable > attempt to do so, but it is understandly controversial because it is a such > a radical departure from the existing syntax. I think that XML-Data should be controversial because from my reading it is just a mix and match combination of interesting features that people want in schemas without a coherent theory of how they should fit together. You can't just put 10 smart people into a working group and have them throw in their good ideas and expect a coherent result. XML-Data's inheritance mechanism does not take advantage of XML's nature as a sequence-oriented language for encoding documents. In other words, it doesn't solve the fundamental problem. > I quite like the idea of an > alternate, XML-based schema syntax, but the real lesson of XML-Data is that > creating an effective inheritance mechanism isn't rocket science. All that > is really needed is a keyword that says "this element type is derived from > that element type". Something like: > > <!element dog extends animal... Sure. This isn't rocket science. But it doesn't solve the fundamental problem at all. You haven't defined what happens to "BARK" sub-elements in "DOG". Without that definition, any software dealing with animals will croak on dogs. Which is exactly what subtyping was supposed to avoid.... > More tricky than any of these technical issues is the question of what, if > anything, could be done to promote a mechanism of this sort. Obviously this > would require a change to the XML spec as well as modification to all > existing tools which process DTDs, so it's a pretty big deal. I wonder if > anyone besides me thinks that a simple mechanism like this would make sense. > If so, is there any room in the XML standards process to discuss a change of > this type at some point in the future (certainly not for XML 1.0)? Personally, I have yet to see a decent proposal for inheritance and subtyping in SGML. Coming up with ibe is difficult, which is why I've spent the last year thinking about it. Dan Connolly has also spent several years thinking about it. I know that there are many others in the same boat. I think that we agree that it doesn't make sense to adopt a solution that solves only 5% of the problem, which is why you will see resistance to anything like that. We will know that we have a complete solution to the problem when HTML 6.0 can be described as a subtype of HTML 5.0, and its behaviour in a "subtype aware" HTML 5.0 browser is predictable and well-defined. Further, HTML 6.0 must not just extend HTML 5.0 in trivial ways such as new <HEAD> tags. It must actually have new elements, with new content models mixed in at all levels. As I said, inheritance-at-the-end solves about 5% of this problem. Paul Prescod - http://itrc.uwaterloo.ca/~papresco "Journalism is good if you follow the rules. Don't allow the human rights groups to spoil your profession" - Col. Godwin Ugbo of the Nigerian military dictatorship xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ============================================================================ Date: Mon, 20 Apr 1998 09:20:12 -0400 From: Paul Prescod <papresco@technologist.com> To: xml-dev <xml-dev@ic.ac.uk> Subject: Inheritance and subtyping in OO languages I've found a good reference to the 8 year old paper that made the distinction between inheritance and subtyping most explicit. The paper itself is not online, but this summary is quite good: "[CCHO89] and [CoHC90] propose an approach based on explicit interfaces and interface containment. In this system of object interfaces, one type is considered a subtype of another if some subset of its interface is identical to that of the second. [...] Hence in this system class-based inheritance is strictly a reusability mechanism for sharing behaviour between objects, not to be confused with subtyping. For example two classes may be equivalent as types, though neither inherits anything from the other. So class hierarchies are not the same as type hierarchies, although they may overlap. Object interfaces [as in Java, C++, etc. - Paul] clarify this distinction between interface containment (subtyping) and class- based inheritance and give insight into limitations caused by equating the notions of type and class in many typed object-oriented programming languages [such as Simula 67 - Paul]." http://progwww.vub.ac.be/prog/persons/kimmens/research/Introduction-to-OO.html The paper itself is called: "Inheritance is not subtyping" and is quite famous, but unfortunately predates the Web. Paul Prescod - http://itrc.uwaterloo.ca/~papresco "Journalism is good if you follow the rules. Don't allow the human rights groups to spoil your profession" - Col. Godwin Ugbo of the Nigerian military dictatorship xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Mon Apr 20 10:45:58 1998 Date: Mon, 20 Apr 1998 11:24:15 -0400 From: Paul Prescod <papresco@technologist.com> Subject: Re: Inheritance in XML Robin Cover wrote: > > OODB SGML/XML Markup > > class defn element declaration > class name element type > object element > attribute attribute > > If we accept this crude analogy, and accept SGML's notion of an > "attribute" as a name-value pair, then the hope of creating subclasses > through SGML/XML element declarations appears slim. I don't think tha the problem is with SGML/XML element type declarations. I think that it is with trying to import too literally OO features. The most important thing about an object is its set of "methods" or "slots". These define its interface. The most important thing about an XML element is its content model, or, more generally, the language it defines (content model+attributes). But languages and methods are very different. If we made XML's attributes "richer", we could have attributes that are more like properties. But the content model problem would remain unless we removed content models altogether. OOP works because they figured out a smart way of defining interfaces (sets of methods) and sub-interfaces (subsets of methods). We must do the same for languages. The problem is easy if we strictly require subtypes to define sublanguages (i.e. merely restricted content models). That would occasionally be useful: <!ELEMENT EMPH (#PCDATA|IMG)> <!ELEMENT STRONG (#PCDATA|IMG) ISA EMPH> <!ELEMENT CITE (#PCDATA) ISA EMPH> But more often we want not just a strict sublanguage, but a language that can be *transformed into* a sublanguage. For example: <!ELEMENT FIGURE (CAPTION, OBJECT)> <!ELEMENT APPLET (CAPTION, JAVACODE) ISA FIGURE( CAPTION=CAPTION, OBJECT=JAVACODE )> To me, this is much more interesting and useful, but also harder to figure out, especially when we use the full power of content models. Paul Prescod - http://itrc.uwaterloo.ca/~papresco "Journalism is good if you follow the rules. Don't allow the human rights groups to spoil your profession" - Col. Godwin Ugbo of the Nigerian military dictatorship xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= Date: Wed, 22 Apr 1998 21:28:46 -0700 From: Jon.Bosak@eng.Sun.COM (Jon Bosak) Subject: Re: Inheritance in XML In-reply-to: <01bd6c3a$c50f2910$020b0ac0@xerius> (matthew@praxis.cz) Sender: owner-xml-dev@ic.ac.uk To: xml-dev@ic.ac.uk I'm generally not able to track discussions like this, fascinating though they may be, and I make it a firm principle not to become involved in them, so don't expect any further comments from me regarding this one. But catching up on my email backlog just now I see so much good energy being wasted that I can't pass by without contributing a couple of items of information that may save some wheel-spinning out there. First, allow me to vent just a little bit about a common misunderstanding. [Matthew Gertner:] | In last month's Wired, XML made it into the "hype list" with the | comment that we crazy XML types are kidding ourselves because XML will | never fly without well-defined semantics. This gets the "No Shit, Sherlock" award for excellence in trade press reporting. XML was very carefully designed to have no built-in semantics whatsoever. So considered in isolation, an XML document is found to have... no semantics! What an insight! And we can go further: to give semantics to this thing that was designed to have no semantics we have to have... it's coming to me, wait a minute... yes! We have to supply something else that *does* provide the semantics! Wow! Pulitzer prize time for sure. Here are some examples of things that can provide semantics for XML documents: * Scripts or programs. Especially Java programs. :-) * Prose descriptions (if you said "DTDs" you are confused, but understandably so; a lot of good people have been confused about this before you). The namespace specification provides a standard way to associate prose descriptions and other bearers of semantic information with classes of XML documents. * Stylesheeets. Especially XSL stylesheets, which are even as we speak being defined by a very active W3C XSL WG. This is why you will want to look carefully at the first XSL working draft expected out in July, because XSL will provide what is intended to be the most powerful standardized high-level way to associate presentational semantics with XML documents in publishing environments. Watch this space: http://www.w3.org/Style/XSL So people who think that there is something missing from XML are by and large simply unaware that it was not intended to be used by itself and that the other pieces are on their way. (There's XLink, too.) This has all been made abundantly clear in every W3C statement about the XML activity for the last year and a half, but it's to be expected that a lot of folks just won't bother to pay attention to stuff like that. Now let's turn to the chief concern of this thread. After a number of excellent observations about the need for a schema language for XML documents and the considerations that have to go into the specification of such a thing, Matthew asks the following question: | More tricky than any of these technical issues is the question of | what, if anything, could be done to promote a mechanism of this | sort. Obviously this would require a change to the XML spec as | well as modification to all existing tools which process DTDs, so | it's a pretty big deal. I wonder if anyone besides me thinks that | a simple mechanism like this would make sense. If so, is there | any room in the XML standards process to discuss a change of this | type at some point in the future (certainly not for XML 1.0)? The answer is, Yes, there are other people who think that it would make sense to design an XML schema mechanism to handle issues like what has been called "inheritance" in this discussion (not to mention good old-fashioned data typing). The workings of a W3C committee can be made public only at the discretion of the chair of the committee, so I will put on my official XML WG Chairman hat and reveal unto ye that the XML WG has officially requested that the job of defining a schema language for XML documents be added to its charter. If approved by the W3C Director, this work would certainly involve a consideration of most of the issues raised in this discussion and would include a close look not only at XML Data but also at other proposed solutions to the same problem. Jon xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)