Architectural Forms: The Next Generation

"Architectural Forms: The Next Generation." Comments on XML-DEV from Joe English, Steven R. Newcomb, and Leigh Dodds, and Rick Jelliffe.
Date: 28 Jan 2002 15:19:37 -0600
From: Steven R. Newcomb <srn@coolheads.com>
To: Joe English <jenglish@flightlab.com>
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Architectural Forms: The Next Generation

Joe English <jenglish@flightlab.com> writes:


> As for reordering elements, the best practice is to
> design the architecture so that, to the extent
> possible, it's never necessary.

Yes.

The Architectural Forms paradigm is strictly for people
who *want* to cooperate with their various communities,
but who can't base their cooperation on the strictest
kind of adherence to a single monolithic document type.
The benefits of AF-based cooperation include:

* the supportability of business models based on
  architecture-specific semantic processing engines,
  and

* the ability to distribute limited authority to
  enhance and embellish the common information
  architecture that reflects the consensual basis of
  community-wide cooperation.

When people *don't* want to cooperate with each other,
we can always fall back on the nuclear weapons of the
industry: groves/property sets, and arbitrary
transformations.  To these, resistance is futile, but
communities miss opportunities to achieve deliberate
consensus and to gain bargaining power for themselves
that will come in very handy when they purchase
infrastructural information technologies specialized
for their common needs.

Alas, most communities are still too ignorant and/or
too fractious to reap these rewards.  Even so, I think
it's a good idea for XML to provide a basis whereby the
enlightened can benefit.  It might tend to improve the
odds of cooperation, which would improve human
productivity, which would benefit all of us, one way or
another.

-- Steve

Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

1527 Northaven Drive
Allen, Texas 75002-1648 USA

------------------------------------------------------------


Date: 13 Feb 2002 15:13:50 -0600
From: Steven R. Newcomb <srn@coolheads.com>
To: Leigh Dodds <ldodds@ingenta.com>
Subject: Re: [xml-dev] Architectural Forms, A Summary

Thanks, Leigh, for doing this.

"Leigh Dodds" <ldodds@ingenta.com> writes:


> * AFs offer a very limited transformation ability,
> see for detailed summary. Therefore they have a
> limited functional overlap with XSLT.


True enough.  My problem with this is that it's the
first thing on your list.  For me, its importance is so
low that it's really just a footnote.  (More about this
below.)


> * AFs promote co-operation between organisations who
> wish to share data by allowing each organisation to
> continue to manage it's own vocabulary. Agreement
> centres on an architectural (or 'meta') schema to
> which the individual vocabularies can be mapped. Each
> party retains sovereignty over their own syntax,
> while having a architectural format to validate other
> documents against. AFs are useful where there's an
> agreement on the essential core, but not on
> non-essential data, or naming. (Also useful
> internally, e.g.  variations across a company)


Well, this implies that the "meta" schema must be
designed as a "meta" schema.  That's not true.  Any
DTD, including any DTD that wasn't designed to be a
meta schema, can be used as a "meta" schema.
"Meta"-ness is a perspective, not an inherent property.

Of course, you can do some extra interesting things if
you design a schema for use as a "meta" schema to begin
with.  Things like "attribute-type architectural
forms", AKA "common attributes".  This is one of the
standard "AFDR" enhancements that were made to the DTD
notation.

Speaking of architectural processors, you should
mention them, and I don't think you did.  The ability
to use plug-in processors for specific inherited
architectures is *the* key economic reason to use AFs,
in my opinion.  (More below.)


> * Because each company provides the mapping from its
> vocabulary to the architectural form, the work is
> distributed amongst the co-operating parties. This
> limits the number of transformations that need to be
> managed by a single party.


Interesting perspective.  I've never thought of it as a
way to distribute *work*; I've always thought of it as
a way to distribute *sovereignty* over vocabularies.


> * Applications are designed to use the architectural
> form, and not the specific vocabularies. There is no
> need to manage local XSLT transforms as each instance
> document (+/- schema) defines it's own mapping.


OK, maybe this is where you're talking about modular,
plug-in, architecture-specific, semantic processors.  I
would be happier if you were more explicit about the
advantages of distributing the cost of such modules
among many applications, rather than implying that the
development of each application must shoulder the
entire burden of developing a semantic processing
system for each inherited architecture.


> * Having a common (architectural) format upon which
> to base processing is more flexible than trying to
> support multiple input formats (particularly when not
> all formats can be transformed into one another)


> * Attribute defaulting makes AFs very simple to use
> with DTDs


> * AFs are useful where Format A cannot be properly
> transformed into Format B, and also where only a
> subset of either Format A or B is required for a
> particular process.


I don't understand the relevance of the first clause
("useful where Format A cannot be properly transformed
into Format B").  The second clause is certainly
correct.


> * An individual schema may reference multiple
> architectures. This allows data to be re-used in
> multiple environments. The alternative is to produce
> data in multiple formats dependent on its expected
> use.


For me, the interesting thing about referencing
multiple architectures is that the semantic processing
logic associated with each referenced architecture can
be purchased and plugged into the application as a
re-usable software module.  Each referenced
architecture is money saved, and software
maturity/reliability gained.


> * While AFs can help facilitate co-operation, if
> there is already a single, or primary vocabulary then
> there is little additional benefit to be gained from
> applying them. They're needlessly 'meta'.


To have a single common DTD is perforce to have a
meta-DTD.  All you have to do is reference it as such.
It is not "needlessly meta".  It's already "meta"
whenever you decide to regard it as "meta".  You make
that decision whenever you start needing your first
local variation, and you still want to retain
interoperability with everyone else who is still using
the common DTD, or any AF-driven local variation
thereof.  Regardless of whether it's used as a
"meta"-DTD or as a plain-vanilla DTD, the common DTD
defines what is being interchanged.


> * A corollary to the above seems to be that if none
> of the parties attempting to co-operate already has
> an XML standard, then defining a single vocabulary
> seems to be a valid starting point.


Damn right!


> * AFs are also applicable for achieving reuse across
> horizontal vocabularies (and in this regard appear to
> directly overlap with the goals for XML
> Namespaces). For example linking semantics are fairly
> clear-cut, yet no-one seems keen to have to apply the
> same names to linking elements.


> * AFs can be used to map between schemas, but only if
> the schemas are designed for this, or are very
> similar.


Well, uh, I think it's much clearer and more accurate
to say:

* Either the two DTDs inherit a common architecture, or

* one of the DTDs inherits the other.

In other words, either they are both designed as
specializations of a common DTD, or one is designed as
a specialization of the other.


> * AFs are primarily a way to indicate that particular
> elements in different vocabularies share semantics,
> where the semantics being shared are very general
> (linking, inclusion, etc).


There is no generality requirement.  The shared
semantics can be just as specialized as anybody wants
them to be.  It's true that standards that use AFs,
such as the ISO/IEC 10744:1997 "HyTime" standard, tend
to be very generalized, but that's because of the
requirements for which their architectures are
designed.


> * Neither AFs nor XSLT are true general XML
> transformation languages.  XSLT offer many more
> transformation features that AFs, however
> transformation isn't the real aim of AFs. 'Mapping'
> might be a better way to put it.


Right.  But this whole "transformation" thing is a red
herring.  It was never the point of AFs.  The
transformation thing just a side-effect of the
methodology used to implement parsers that can support
the needs of re-usable, architecture-specific semantic
processing modules, such as "HyTime engines".


> * An advantage of AFs is that they can be implemented
> very simply, and work in a streaming mode (e.g. as a
> SAX Filter). XSLT cannot; however XSLT can also be
> used to implement architectural mapping, cf APEX


> * AFs can be used to implement I18N of
> vocabularies. Mapping element/attribute names to/from
> their original language.


That's interesting.  I didn't realize that.


> * AFs as originally specified are closely tied to
> DTDs and Processing Instruction based syntax. However
> they can be in isolation or in conjunction with
> another schema language. cf: AFNG


As far as XML is concerned, what you say is correct.
However, the *original* original syntax was based on
NOTATION attributes, a feature of SGML that was
unaccountably omitted from XML.  That's why the
alternative PI-based syntax had to be invented: XML
could not support the NOTATION attribute-based syntax.


> * Both Namespaces and AFs are used to associate
> semantics. Namespaces say "this is an element from
> the X namespace (e.g. XHTML) and should be processed
> as such". AFs say "this element is directly
> equivalent to element Y in architecture B, and should
> be processed as such". With the caveat that
> "processed as such" doesn't necessarily require
> global agreement, but does require local consistency.


I don't understand the words, "doesn't necessarily
require global agreement".  Semantically, global
agreement is required.  Syntactically, global agreement
about names (GIs and attribute names, and the question
of whether certain things are expressed as element
contents vs. attribute values) is not required.


> * Using RDDL, or similar, Namespaces can be made to
> point directly to a description of these semantics
> (thin ice here). No such mechanism for AFs, or rather
> original mechanism used PubId, but there's no
> standard documentation.


Not true.  The AF paradigm requires an "Architecture
Definition Document (ADD)", and both of the syntaxes
for declaring base architectures provide places for
pointers to ADDs.  There are also separate places for
pointing to the DTD.


> * A key premise of AFs is that the GI is only one
> property of the element that could be used to direct
> processing. An (architectural) attribute is an
> equally valid dispatch mechanism. This view allows an
> element to have multiple types (i.e. be mapped to
> elements in multiple architectures).  This is in some
> way counter to XML/Namespaces where the GI is the
> type of the element. The mid-ground seems to be that
> the GI defines the primary relationship, and that one
> concedes that (other) attributes can be used to
> dispatch processing. (cf: role attribute pattern).


Cf: XLink.  As far as XLink is concerned, the meta-GIs
are provided by certain standard attributes.  I think
you're trying to make a distinction without a
difference, here.  If the actual GI doesn't *also*
dispatch (or at least affect) at least *some*
processing in some way, what's the point of uttering it
at all?


> The real message here is that when we exchange data
> we agree on how to process it. Using element names is
> one way. Keying of attributes is another, and also
> allows me to have separate agreements with another
> party, but use the same data.


I think the real message of AFs is that with them we
can verify whether a document that conforms to a local
variation of a "standard" (agreed-upon, community-wide)
DTD also, when understood in terms of the mapping that
the instance itself describes, conforms to the
syntactic constraints imposed on instances of the
"standard" DTD.  This is what makes it possible,
economically speaking, to enjoy the advantages of
re-usable, modular, architecture-specific semantic
processing engines.  The basic point is that, when
information is not properly understood on arrival, the
finger of blame can be pointed at the non-conforming
party.  Was the document at fault, or the system that
tried to understand it?

The AF paradigm may, in some cases, also allow the same
data to be processed by different architecture-specific
engines.  This is where "multiple inheritance" for
individual elements is important.  The effect is to
allow, in some cases, a single dataset to describe
itself in such a way as to allow it to be understood in
multiple different vendor-specific processing contexts.

Personally, I would like to see the AF idea developed
in such a way as to make it an even more powerful
data-self-description paradigm, so that there would be
no limits on the ability of data to describe its own
transformations for use in various processing contexts.
The result would be that information owners could serve
many markets with a single product, and everyone would
be able to tell who was responsible for any failure to
interchange information.  AFs, as we know them, aren't
quite up to that particular challenge.  Yet.

-- Steve

Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

1527 Northaven Drive
Allen, Texas 75002-1648 USA

-------------------------------------------------

Perhaps Steven could mention what the difference between an "architecture" and
an "architectural form" is. 

And Leigh's article could say:
   XHTML is an architecture
   XLink is an architecture
   RDDL is another architecture (using XHTML and XLink both in particular ways)

If that is so...

Cheers
Rick Jelliffe

-----------------------------------------------------

"Rick Jelliffe" <ricko@allette.com.au> writes:

> And Leigh's article could say:
>    XHTML is an architecture

Yes.

>    XLink is an architecture

Yes, but it's a slight stretch, since the root element
type is not explicitly provided by XLink.  Since the
root element type, if it had been provided, wouldn't
have imposed any constraints, I'm not too worried about
calling XLink an architecture.

Of course, you need to use inclusion exceptions in
order to be able to specify a non-constraining content
model for the root element type of an architecture.
That's a serious problem for the use of architectural
forms in XML, because XML-conforming DTDs can't support
inclusion exceptions.  However, it's possible to adopt
a convention for XML that says, "If no root element
type is provided, architecture-specific processors
should assume that the elements whose contexts are not
constrained by the architecture can appear anywhere in
the instance."  Implicitly, that's exactly what XLink
did.  As yet, however, no such corresponding adjustment
has ever been made to the ISO standard, nor to the only
parser that handles architectural forms correctly (SP).

>    RDDL is another architecture (using XHTML and
>    XLink both in particular ways)

Yes, with the same proviso.

> Perhaps Steven could mention what the difference
> between an "architecture" and an "architectural form"
> is.

An architecture has: 

* a schema (some set of syntactic constraints) *and*

* specific, explicitly-defined semantics for each
  syntactic construct defined in the schema.

An architectural form is a syntactic construct defined
in the schema.  There are two kinds:

1.  Element-type architectural forms.  These define
    semantics and syntactic constraints on the content
    of the element type and on its attributes.  Since
    the root element type is always declared by an
    architecture, and since the root element type
    required to be the ancestor of all of the contained
    architectural elements, it is also correct (but a
    bit confusing) to say, informally, that a whole
    architecture is (or is invoked by) a single
    architectural form.  Because of this ambiguity, the
    following two statements are effectively
    equivalent:

    (1) I'm using the HyTime architecture.

    (2) I'm using the HyDoc architectural form.

    If the name of the architecture happens to be the
    same as the GI of its root element, it's even more
    confusing.  The following two statements are
    effectively equivalent:

    (1) I'm using the XHTML architecture.

    (2) I'm using the xhtml architectural form.

2.  Attribute-list-type architectural forms.  These are
    lists of one or more attributes that may appear on
    instances of some set of architectural elements.
    The ones that are recognized on *all* architectural
    elements are called "common attributes".  (The
    ability to recycle attribute lists explicitly along
    with their semantics was the main enhancement of
    the DTD formalism introduced by the "SGML Extended
    Facilities" annex of the HyTime standard.  It's not
    a *necessary* enhancement -- you can accomplish the
    same things less conveniently without using it --
    but it sure is nice to have.)

-- Steve

Steven R. Newcomb, Consultant
srn@coolheads.com
Prepared by Robin Cover for The XML Cover Pages archive. See: "Architectural Forms and SGML/XML Architectures."