Question about Architectures and Versioning
From owner-xml-dev@ic.ac.uk Fri Jun 12 18:10:52 1998
Date: Fri, 12 Jun 1998 18:00:49 -0500
Message-Id: <199806122300.SAA01648@bruno.techno.com>
From: "Steven R. Newcomb" <srn@techno.com>
To: andrewl@microsoft.com
CC: xml-dev@ic.ac.uk
Subject: Re: Question about Architectures and Versioning
-------------------------------------------------------------------
> From: Andrew Layman <andrewl@microsoft.com>
>
> How does one go about using Architectures to solve the following problem.
>
> Suppose in version one of my documents, I have instances that look like
>
> <Book>Gone With the Wind</Book>
>
> In version 2, I have instances that look like
>
> <Book>
> <Title>Gone With the Wind</Title>
> <Author>
> <Person>
> <Firstname>Margaret</Firstname>
> <Lastname>Mitchell</Lastname>
> </Person>
> </Author>
> </Book>
>
> How do I write my architectures so that the V2 instance is mapped to
> the V1 architecture?
Andrew --
You've asked a good question. I think it has a good answer. In order
to explain this, I have to define the V2 and V2 architectures, and
turn your example fragments into complete documents. Then I'll
discuss what problems arise, and what to do about them.
*************************
** The V1 Architecture **
*************************
<!-- the V1 architecture -->
<!ELEMENT V1 - - (Book)>
<!ELEMENT Book - - (#PCDATA)>
*************************
** The V2 Architecture **
*************************
<!-- the V2 architecture -->
<?IS10744:arch
name="V1"
dtd-public-id="-//Andrew Layman//DTD The V1 Architecture//EN"
>
<!ELEMENT V2 - - (Book)>
<!ELEMENT Book - - (Title?, Author?)>
<!-- note: auto name mapping is on, so elements of the above type
will be regarded as conforming to the V1 <Book> architectural
form -->
<!ELEMENT Title - - (#PCDATA)>
<!ELEMENT Author - - (Person)>
<!ELEMENT Person - - (Firstname, Lastname)>
<!ELEMENT Firstname - - (#PCDATA)>
<!ELEMENT Lastname - - (#PCDATA)>
*****************
** Instance I1 **
*****************
<!-- instance #I1 -->
<Mydoc>
<Book>Gone With the Wind</Book>
</Mydoc>
*****************
** Instance I2 **
*****************
<!-- instance #I2 -->
<Mydoc>
<Book>
<Title>Gone With the Wind</Title>
<Author>
<Person>
<Firstname>Margaret</Firstname>
<Lastname>Mitchell</Lastname>
</Person>
</Author>
</Book>
</Mydoc>
***************************
** Parsing I1 against V1 **
***************************
If we parse I1 against V1, we get a grove that, if it were
re-expressed in XML, would look like this:
<V1>
<Book>Gone With the Wind</Book>
</V1>
I.e., No problem. (And no surprise.) Note that the
document element has automatically become the document
element of the architecture.
***************************
** Parsing I2 against V2 **
***************************
If we parse I2 against V2, we get:
<V2>
<Book>
<Title>Gone With the Wind</Title>
<Author>
<Person>
<Firstname>Margaret</Firstname>
<Lastname>Mitchell</Lastname>
</Person>
</Author>
</Book>
</V2>
I.e., again, no problem. (And, again, no surprise.)
***************************
** Parsing I2 against V1 **
***************************
If we parse I2 against V1, taking no other measures, we get:
<V1>
<Book>Gone With the WindMargaretMitchell</Book>
</V1>
Clearly, this is a mess, but it illustrates the principle that, by
default, markup that does not belong in a given architecture simply
disappears, from the perspective of that architecture. What to do
about the mess, though?
It's reasonable to assume that the person who writes the V2
architecture intends for V2 documents to be usable with V1 browsers
(or other applications equipped with V1 engines). In other words, we
want the title of the book to become the content of the <Book>
element, as was the case in the V1 architecture, and we want Margaret
Mitchell's name to disappear, since the V1 architecture made no
provision for an author's name. This can be done as follows:
<!-- the V2 architecture, as amended -->
<?IS10744:arch
name="V1"
dtd-public-id="-//Andrew Layman//DTD The V1 Architecture//EN"
ignore-data-att="V1IgnoreData"
>
<!ELEMENT V2 - - (Book)>
<!ELEMENT Book - - (Title?, Author?)>
<!ELEMENT Title - - (#PCDATA)>
<!ELEMENT Author - - (Person)>
<!ATTLIST Author
V1IgnoreData CDATA "ArcIgnD"
>
<!ELEMENT Person - - (Firstname, Lastname)>
<!ELEMENT Firstname - - (#PCDATA)>
<!ELEMENT Lastname - - (#PCDATA)>
Note that we have declared that the name of the "Architecture Ignore
Data Attribute" for the V1 architecture is "V1IgnoreData". When this
attribute appears on an element instance, its value controls whether
the ultimate data content of the element will be regarded as part of
the document, from the perspective of this architecture. We have also
declared, above, that the V1IgnoreData attribute has a default value
of "ArcIgnD" on instances of the <Author> element. This means that,
from the perspective of the V1 architecture, the data content of
the <Author> element, and the data contents of all of the elements
that it contains, will be ignored (will disappear).
Digression: The possible values of any "architecture ignore data
attribute" are:
ArcIgnD : Data is always ignored.
nArcIgnD : Data is not ignored, and it is an error if
data occurs where the architecture does not
allow it.
cArcIgnD : Data is conditionally ignored (data will be
ignored only when it occurs where the
architecture does not allow it.)
The default value is taken to be cArcIgnD.
**************************************
** Parsing I2 against V1 as amended **
**************************************
If we parse I2 against V1, as amended, we get:
<V1>
<Book>Gone With the Wind</Book>
</V1>
Q.E.D., right?
***************************
** Parsing I1 against V2 **
***************************
If we parse I1 against V2, taking no other measures, we get:
<V2>
<Book></Book>
</V2>
What happened to the title of the book? It disappeared because the
default value of the ignore-data-att is "cArcIgnD", which means that
when data is not allowed in the content of an element, it will be
ignored. The V2 architecture does not permit #PCDATA in the content
of <Book> elements, so the data "Gone With the Wind" disappeared
automatically. If we don't want the data to be ignored, we can force
the data to appear by setting V2IgnoreData to "nArcIgnD". However,
making the data appear where it's not allowed to appear will create a
parsing validation error, so, if we really need to use the same
meta-DTD for both V1 and V2 documents (we don't), this
solution is not so good.
If we must use the same meta-DTD for both older V1 documents and newer
V2 documents, in order to maintain the upward compatibility of older
V1 documents it would be best, when creating the V2 architecture, to
anticipate this problem as follows:
(1) Allow #PCDATA in the content of V2 <Book> elements, in addition to
the <Title> and <Author> elements, and
(2) Provide instructions to V2 application developers (in the V2
Architecture Definition Document [ADD]) indicating that V2
application engines must expect #PCDATA in <Book> instances, and
that they must treat such data content as if it were in a V2
<Title> element. The ADD might also advise that V2 systems should
not create documents that put #PCDATA in the content of <Book>
elements, even though it's allowed there, and that book titles
should always appear in <Title> elements.
*******************************************************************************
But how can we do all this without a meta-DTD of any kind?
Well, first, a caveat: you can't check an instance for conformance to
a model unless you have both the instance and the model. So
validation of instances by means of a general-purpose parser is not
possible unless you have a meta-DTD.
And a second caveat: you can't create an application with an
information-interchange feature unless you have a model for the
information to be interchanged. So, at some level, there's no such
thing as an architecture without some sort of model, somewhere.
Even if there's no meta-DTD available, however, you can still enjoy
essentially all of the virtues of AFs, assuming you have an engine
capable of recognizing the architectural forms that pertain to it, and
capable of performing the processing required by those architectural
forms. (Such an engine would probably incorporate at least some of
the logic necessary to validate the forms that it recognizes, in any
case.) The only really noticeable disadvantage of not having the
meta-DTD handy is that you don't get the markup minimization you can
get from DTDs and meta-DTDs. This disadvantage would not affect our
instance #I1 at all:
<!-- instance #I1; no change -->
<?IS10744:arch name="V1">
<Mydoc>
<Book>Gone With the Wind</Book>
</Mydoc>
But it would affect instance #I2 to the extent that we'd have to make
the use of the "Architecture Ignore Data Attribute" explicit in order
for #I2 to be usefully parsable against Architecture V1:
<!-- instance #I2 without meta-DTDs -->
<?IS10744:arch
name="V1"
public-id="-//Andrew Layman//ADD Andrew Layman's V1 Architecture Definition Document//EN"
ignore-data-att="V1IgnoreData"
>
<?IS10744:arch
name="V2"
public-id="-//Andrew Layman//ADD Andrew Layman's V2 Architecture Definition Document//EN"
>
<Mydoc>
<Book>
<Title>Gone With the Wind</Title>
<Author V1IgnoreData="ArcIgnD">
<Person>
<Firstname>Margaret</Firstname>
<Lastname>Mitchell</Lastname>
</Person>
</Author>
</Book>
</Mydoc>
Note: Just for fun, I used the "public-id" pseudo-attribute to give
the formal public identifiers of the Architecture Definition Documents
(ADDs) of the V1 and V2 architectures. These documents are not
meta-DTDs (although they may include meta-DTDs) and they are not
directly machine-processable; they are just explanations of the
architectures, probably written in some natural language (these are
declared to be in English: "//EN"). The purpose of declaring them is
merely to disambiguate the architectures we're declaring from any
others that might be called "V1" or "V2".
Final note #1: With AFs, even when we mix many kinds of semantics and
vocabularies into our documents, we can still have the ability to
verify, simply and directly, that any newly created document that uses
an architecture will be reliably processable by any application of
that architecture. By the same token, anyone creating an application
of that architecture will not face an indefinitely-long list of
possible configurations of the information.
Final, final note: AFs are an elegant general solution to the problem
of recognizing, processing, and mixing all of the semantic facilities
of XML into arbitrary XML documents, including both RDF and XLink, to
name two, with minimal or no cost to the flexibility of other document
architectures. They also have the effect of giving people other than
the W3C ability to create similar, but totally arbitrary
metastructures of arbitrary complexity, and to use them for reliable
and robust information interchange. I remain utterly and passionately
convinced that it's MUCH better to have one, strong, general way of
mixing common semantic constructs into structured documents, than to
have several dissimilar ways of doing so.
-Steve
--
Steven R. Newcomb, President, TechnoTeacher, Inc.
Email: srn@techno.com
WWW: http://www.techno.com
voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax +1 972 994 0087 (at ISOGEN: +1 214 953 3152)
3615 Tanner Lane
Richardson, Texas 75082-2618 USA
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)