Architectural Forms: The Next Generation
"Architectural Forms: The Next Generation." Comments on XML-DEV from Joe English, Steven R. Newcomb, and Leigh Dodds, and Rick Jelliffe.
Date: 28 Jan 2002 15:19:37 -0600 From: Steven R. Newcomb <srn@coolheads.com> To: Joe English <jenglish@flightlab.com> Cc: xml-dev@lists.xml.org Subject: Re: [xml-dev] Architectural Forms: The Next Generation Joe English <jenglish@flightlab.com> writes: > As for reordering elements, the best practice is to > design the architecture so that, to the extent > possible, it's never necessary. Yes. The Architectural Forms paradigm is strictly for people who *want* to cooperate with their various communities, but who can't base their cooperation on the strictest kind of adherence to a single monolithic document type. The benefits of AF-based cooperation include: * the supportability of business models based on architecture-specific semantic processing engines, and * the ability to distribute limited authority to enhance and embellish the common information architecture that reflects the consensual basis of community-wide cooperation. When people *don't* want to cooperate with each other, we can always fall back on the nuclear weapons of the industry: groves/property sets, and arbitrary transformations. To these, resistance is futile, but communities miss opportunities to achieve deliberate consensus and to gain bargaining power for themselves that will come in very handy when they purchase infrastructural information technologies specialized for their common needs. Alas, most communities are still too ignorant and/or too fractious to reap these rewards. Even so, I think it's a good idea for XML to provide a basis whereby the enlightened can benefit. It might tend to improve the odds of cooperation, which would improve human productivity, which would benefit all of us, one way or another. -- Steve Steven R. Newcomb, Consultant srn@coolheads.com voice: +1 972 359 8160 fax: +1 972 359 0270 1527 Northaven Drive Allen, Texas 75002-1648 USA ------------------------------------------------------------ Date: 13 Feb 2002 15:13:50 -0600 From: Steven R. Newcomb <srn@coolheads.com> To: Leigh Dodds <ldodds@ingenta.com> Subject: Re: [xml-dev] Architectural Forms, A Summary Thanks, Leigh, for doing this. "Leigh Dodds" <ldodds@ingenta.com> writes: > * AFs offer a very limited transformation ability, > see for detailed summary. Therefore they have a > limited functional overlap with XSLT. True enough. My problem with this is that it's the first thing on your list. For me, its importance is so low that it's really just a footnote. (More about this below.) > * AFs promote co-operation between organisations who > wish to share data by allowing each organisation to > continue to manage it's own vocabulary. Agreement > centres on an architectural (or 'meta') schema to > which the individual vocabularies can be mapped. Each > party retains sovereignty over their own syntax, > while having a architectural format to validate other > documents against. AFs are useful where there's an > agreement on the essential core, but not on > non-essential data, or naming. (Also useful > internally, e.g. variations across a company) Well, this implies that the "meta" schema must be designed as a "meta" schema. That's not true. Any DTD, including any DTD that wasn't designed to be a meta schema, can be used as a "meta" schema. "Meta"-ness is a perspective, not an inherent property. Of course, you can do some extra interesting things if you design a schema for use as a "meta" schema to begin with. Things like "attribute-type architectural forms", AKA "common attributes". This is one of the standard "AFDR" enhancements that were made to the DTD notation. Speaking of architectural processors, you should mention them, and I don't think you did. The ability to use plug-in processors for specific inherited architectures is *the* key economic reason to use AFs, in my opinion. (More below.) > * Because each company provides the mapping from its > vocabulary to the architectural form, the work is > distributed amongst the co-operating parties. This > limits the number of transformations that need to be > managed by a single party. Interesting perspective. I've never thought of it as a way to distribute *work*; I've always thought of it as a way to distribute *sovereignty* over vocabularies. > * Applications are designed to use the architectural > form, and not the specific vocabularies. There is no > need to manage local XSLT transforms as each instance > document (+/- schema) defines it's own mapping. OK, maybe this is where you're talking about modular, plug-in, architecture-specific, semantic processors. I would be happier if you were more explicit about the advantages of distributing the cost of such modules among many applications, rather than implying that the development of each application must shoulder the entire burden of developing a semantic processing system for each inherited architecture. > * Having a common (architectural) format upon which > to base processing is more flexible than trying to > support multiple input formats (particularly when not > all formats can be transformed into one another) > * Attribute defaulting makes AFs very simple to use > with DTDs > * AFs are useful where Format A cannot be properly > transformed into Format B, and also where only a > subset of either Format A or B is required for a > particular process. I don't understand the relevance of the first clause ("useful where Format A cannot be properly transformed into Format B"). The second clause is certainly correct. > * An individual schema may reference multiple > architectures. This allows data to be re-used in > multiple environments. The alternative is to produce > data in multiple formats dependent on its expected > use. For me, the interesting thing about referencing multiple architectures is that the semantic processing logic associated with each referenced architecture can be purchased and plugged into the application as a re-usable software module. Each referenced architecture is money saved, and software maturity/reliability gained. > * While AFs can help facilitate co-operation, if > there is already a single, or primary vocabulary then > there is little additional benefit to be gained from > applying them. They're needlessly 'meta'. To have a single common DTD is perforce to have a meta-DTD. All you have to do is reference it as such. It is not "needlessly meta". It's already "meta" whenever you decide to regard it as "meta". You make that decision whenever you start needing your first local variation, and you still want to retain interoperability with everyone else who is still using the common DTD, or any AF-driven local variation thereof. Regardless of whether it's used as a "meta"-DTD or as a plain-vanilla DTD, the common DTD defines what is being interchanged. > * A corollary to the above seems to be that if none > of the parties attempting to co-operate already has > an XML standard, then defining a single vocabulary > seems to be a valid starting point. Damn right! > * AFs are also applicable for achieving reuse across > horizontal vocabularies (and in this regard appear to > directly overlap with the goals for XML > Namespaces). For example linking semantics are fairly > clear-cut, yet no-one seems keen to have to apply the > same names to linking elements. > * AFs can be used to map between schemas, but only if > the schemas are designed for this, or are very > similar. Well, uh, I think it's much clearer and more accurate to say: * Either the two DTDs inherit a common architecture, or * one of the DTDs inherits the other. In other words, either they are both designed as specializations of a common DTD, or one is designed as a specialization of the other. > * AFs are primarily a way to indicate that particular > elements in different vocabularies share semantics, > where the semantics being shared are very general > (linking, inclusion, etc). There is no generality requirement. The shared semantics can be just as specialized as anybody wants them to be. It's true that standards that use AFs, such as the ISO/IEC 10744:1997 "HyTime" standard, tend to be very generalized, but that's because of the requirements for which their architectures are designed. > * Neither AFs nor XSLT are true general XML > transformation languages. XSLT offer many more > transformation features that AFs, however > transformation isn't the real aim of AFs. 'Mapping' > might be a better way to put it. Right. But this whole "transformation" thing is a red herring. It was never the point of AFs. The transformation thing just a side-effect of the methodology used to implement parsers that can support the needs of re-usable, architecture-specific semantic processing modules, such as "HyTime engines". > * An advantage of AFs is that they can be implemented > very simply, and work in a streaming mode (e.g. as a > SAX Filter). XSLT cannot; however XSLT can also be > used to implement architectural mapping, cf APEX > * AFs can be used to implement I18N of > vocabularies. Mapping element/attribute names to/from > their original language. That's interesting. I didn't realize that. > * AFs as originally specified are closely tied to > DTDs and Processing Instruction based syntax. However > they can be in isolation or in conjunction with > another schema language. cf: AFNG As far as XML is concerned, what you say is correct. However, the *original* original syntax was based on NOTATION attributes, a feature of SGML that was unaccountably omitted from XML. That's why the alternative PI-based syntax had to be invented: XML could not support the NOTATION attribute-based syntax. > * Both Namespaces and AFs are used to associate > semantics. Namespaces say "this is an element from > the X namespace (e.g. XHTML) and should be processed > as such". AFs say "this element is directly > equivalent to element Y in architecture B, and should > be processed as such". With the caveat that > "processed as such" doesn't necessarily require > global agreement, but does require local consistency. I don't understand the words, "doesn't necessarily require global agreement". Semantically, global agreement is required. Syntactically, global agreement about names (GIs and attribute names, and the question of whether certain things are expressed as element contents vs. attribute values) is not required. > * Using RDDL, or similar, Namespaces can be made to > point directly to a description of these semantics > (thin ice here). No such mechanism for AFs, or rather > original mechanism used PubId, but there's no > standard documentation. Not true. The AF paradigm requires an "Architecture Definition Document (ADD)", and both of the syntaxes for declaring base architectures provide places for pointers to ADDs. There are also separate places for pointing to the DTD. > * A key premise of AFs is that the GI is only one > property of the element that could be used to direct > processing. An (architectural) attribute is an > equally valid dispatch mechanism. This view allows an > element to have multiple types (i.e. be mapped to > elements in multiple architectures). This is in some > way counter to XML/Namespaces where the GI is the > type of the element. The mid-ground seems to be that > the GI defines the primary relationship, and that one > concedes that (other) attributes can be used to > dispatch processing. (cf: role attribute pattern). Cf: XLink. As far as XLink is concerned, the meta-GIs are provided by certain standard attributes. I think you're trying to make a distinction without a difference, here. If the actual GI doesn't *also* dispatch (or at least affect) at least *some* processing in some way, what's the point of uttering it at all? > The real message here is that when we exchange data > we agree on how to process it. Using element names is > one way. Keying of attributes is another, and also > allows me to have separate agreements with another > party, but use the same data. I think the real message of AFs is that with them we can verify whether a document that conforms to a local variation of a "standard" (agreed-upon, community-wide) DTD also, when understood in terms of the mapping that the instance itself describes, conforms to the syntactic constraints imposed on instances of the "standard" DTD. This is what makes it possible, economically speaking, to enjoy the advantages of re-usable, modular, architecture-specific semantic processing engines. The basic point is that, when information is not properly understood on arrival, the finger of blame can be pointed at the non-conforming party. Was the document at fault, or the system that tried to understand it? The AF paradigm may, in some cases, also allow the same data to be processed by different architecture-specific engines. This is where "multiple inheritance" for individual elements is important. The effect is to allow, in some cases, a single dataset to describe itself in such a way as to allow it to be understood in multiple different vendor-specific processing contexts. Personally, I would like to see the AF idea developed in such a way as to make it an even more powerful data-self-description paradigm, so that there would be no limits on the ability of data to describe its own transformations for use in various processing contexts. The result would be that information owners could serve many markets with a single product, and everyone would be able to tell who was responsible for any failure to interchange information. AFs, as we know them, aren't quite up to that particular challenge. Yet. -- Steve Steven R. Newcomb, Consultant srn@coolheads.com voice: +1 972 359 8160 fax: +1 972 359 0270 1527 Northaven Drive Allen, Texas 75002-1648 USA ------------------------------------------------- Perhaps Steven could mention what the difference between an "architecture" and an "architectural form" is. And Leigh's article could say: XHTML is an architecture XLink is an architecture RDDL is another architecture (using XHTML and XLink both in particular ways) If that is so... Cheers Rick Jelliffe ----------------------------------------------------- "Rick Jelliffe" <ricko@allette.com.au> writes: > And Leigh's article could say: > XHTML is an architecture Yes. > XLink is an architecture Yes, but it's a slight stretch, since the root element type is not explicitly provided by XLink. Since the root element type, if it had been provided, wouldn't have imposed any constraints, I'm not too worried about calling XLink an architecture. Of course, you need to use inclusion exceptions in order to be able to specify a non-constraining content model for the root element type of an architecture. That's a serious problem for the use of architectural forms in XML, because XML-conforming DTDs can't support inclusion exceptions. However, it's possible to adopt a convention for XML that says, "If no root element type is provided, architecture-specific processors should assume that the elements whose contexts are not constrained by the architecture can appear anywhere in the instance." Implicitly, that's exactly what XLink did. As yet, however, no such corresponding adjustment has ever been made to the ISO standard, nor to the only parser that handles architectural forms correctly (SP). > RDDL is another architecture (using XHTML and > XLink both in particular ways) Yes, with the same proviso. > Perhaps Steven could mention what the difference > between an "architecture" and an "architectural form" > is. An architecture has: * a schema (some set of syntactic constraints) *and* * specific, explicitly-defined semantics for each syntactic construct defined in the schema. An architectural form is a syntactic construct defined in the schema. There are two kinds: 1. Element-type architectural forms. These define semantics and syntactic constraints on the content of the element type and on its attributes. Since the root element type is always declared by an architecture, and since the root element type required to be the ancestor of all of the contained architectural elements, it is also correct (but a bit confusing) to say, informally, that a whole architecture is (or is invoked by) a single architectural form. Because of this ambiguity, the following two statements are effectively equivalent: (1) I'm using the HyTime architecture. (2) I'm using the HyDoc architectural form. If the name of the architecture happens to be the same as the GI of its root element, it's even more confusing. The following two statements are effectively equivalent: (1) I'm using the XHTML architecture. (2) I'm using the xhtml architectural form. 2. Attribute-list-type architectural forms. These are lists of one or more attributes that may appear on instances of some set of architectural elements. The ones that are recognized on *all* architectural elements are called "common attributes". (The ability to recycle attribute lists explicitly along with their semantics was the main enhancement of the DTD formalism introduced by the "SGML Extended Facilities" annex of the HyTime standard. It's not a *necessary* enhancement -- you can accomplish the same things less conveniently without using it -- but it sure is nice to have.) -- Steve Steven R. Newcomb, Consultant srn@coolheads.com
Prepared by Robin Cover for The XML Cover Pages archive. See: "Architectural Forms and SGML/XML Architectures."