[This local archive copy is from the official and canonical URL, http://www.ornl.gov/sgml/wg8/document/1998.htm; please refer to the canonical source document if possible.]

ISO/IEC JTC 1/SC34 N1998

ISO/IEC JTC 1/SC34

Information Technology ---

Document Description and Processing Languages

TITLE:

Schloss/Newcomb Correspondence on Metadata

SOURCE:

Steve Newcomb, with comments from Robert Schloss

PROJECT:

Metadata Workshop, Paris

PROJECT EDITOR:

 

STATUS:

Summary of e-mail conversations

ACTION:

For information

DATE:

29 June 1998

DISTRIBUTION:

SC34 and Liaisons

REFER TO:

 

REPLY TO:

Dr. James David Mason
(ISO/IEC JTC1/SC34 Chaiman)
Lockheed Martin Energy Systems
Information Management Services
1060 Commerce Park, M.S. 6480
Oak Ridge, TN 37831-6480 U.S.A.
Telephone: +1 423 574-6973
Facsimile: +1 423 574-0004
Network: masonjd@ornl.gov
http://www.ornl.gov/sgml/wg4/
ftp://ftp.ornl.gov/pub/sgml/wg4/

Date: Fri, 26 Jun 1998 14:44:33 -0500
From: "Steven R. Newcomb" <srn@techno.com>
Subject: Schloss/Newcomb correspondence
To: metadata@gca.org
Message-id: <199806261944.OAA03963@bruno.techno.com>
X-UIDL: 68e30dfb18505b0cf56129bd1c4e9a1e
 
 
Dear Paris Metadata Summit Participants,
 
Once this list was set up (sorry about the long delay), I wrote to Bob
and asked whether I should send the fruit of our labor to understand
each other to the list.  Bob responded:
 
> I am now convinced that there are situations where AFs should be
> used and others where namespace prefixes are better.  I was hoping
> to write all this out to share with you and others, but that has not
> happened.  And in 30 minutes I disappear for one month of vacation
> in Israel with my wife and son.  I know that when I come back, I
> won't have a lot of time to pursue this until mid-August the
> earliest.
 
> If you wish to post our correspondance, that is okay, but you should
> probably add a note that says "Bob has done some additional thinking
> but was unable to continue the discussion before he left to go out
> of town until the end of July" .
 
So, please consider the above notes added.  Maybe Bob will find the
time to share his further thoughts with us in the relatively near future.
 
-Steve
 
--
Steven R. Newcomb, President, TechnoTeacher, Inc.
srn@techno.com  http://www.techno.com  ftp.techno.com
 
voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax    +1 972 994 0087 (at ISOGEN: +1 214 953 3152)
 
3615 Tanner Lane
Richardson, Texas 75082-2618 USA
 
 
 
********************************************************************************
This first installment is what I was trying to articulate at the end
of the meeting in Paris.  It has been edited according to Bob's
instructions.  (I'm sparing you all the correspondence that went into
preparing this.)
 
There is more correspondence between Bob and me to share with you, but
I don't want to send it unless there is some expression of interest in
it.  The rest of it is pretty much devoted to an explanation of how
architectural forms work, in the form of answers to Bob's pointed
questions about them.  --SRN
********************************************************************************
 
Some Observations Apropos the Metadata Summit in Paris, May 22, 1998
 
Steven R. Newcomb
 
As currently drafted, RDF uses a single standard algorithm to convert
metadata represented in an XML document (using a vocabulary from a
number of declared namespaces) into a queriable resource (tuples/property
graphs).  The fact that there is a single algorithm for generating, in
effect, an API to the metadata objects, imposes many constraints on
both the interchange architecture of the metadata (the DTDs or other
schema representations of the structure of metadata information in the
form in which it is normally interchanged, as XML instances) and also
on the API to the information being interchanged by that architecture.
 
The conversion of XML instances into what are effectively APIs to
their information content is something that every XML application must
do, in one way or another.  However, because there is no limit to the
variability of the kinds of information conveyed in XML documents,
there can never be a single algorithm that will convert instances
conforming to every interchange architecture (DTD) into the most
useful and minimal API to the meaning of instances conforming to that
architecture.  As an example, consider the XLink interchange
architecture.  Any really useful API to the meaning of an XLink would
be able to provide, among other things, reports as to the anchor
status of information nodes that might not even be in the same
document as the XLinks.  Therefore, a special API, written to provide
useful access to the meaning of XLinks, is needed.  It is hard to
imagine how any algorithm could generate such an API, given only the
schema of the XLink interchange architecture or an XLink element type
definition.  (Masatomo Goto of Fujitsu Labs developed a Property Set
for XLink in order to build the XLink engine he demonstrated at SGML
Europe '98.  A diagram of that Property Set exists; see
"References", below.)
 
At this time, RDF's designers are working to the requirements of a few
popular metadata architectures.  It is expected that these metadata
architectures can be constrained in such a way that a single algorithm
can generate a useful API directly from the architectures.
 
Although there is nothing fundamentally wrong with the current RDF
approach, given its limited requirements, the current RDF approach is
profoundly suboptimal when considered in the larger context.  To
understand the larger context, we must recognize that practically
everything that will be done with XML, including all of its draft and
proposed semantic enhancements (including XLink and XPointers), is
best realized as a pair of distinct formal expressions:
 
(1) the document type definition (DTD) or other schema that is the
    formal expression of the *interchange syntax* of the architecture,
    and
 
(2) the Property Set or other schema that is the formal expression of
    the *abstract API to the information conveyed by instances that
    conform to the architecture*.  (If you can imagine having the
    ability to add a module to the DOM for each interchange
    architecture, so that there are now additional objects that
    reflect the semantic phenomena expressible in that interchange
    architecture, you can understand what a Property Set is.
    Unfortunately, the current draft of the DOM is not set up to
    support this, but it could be, while still meeting or exceeding
    all of its current objectives and requirements.  First of all,
    there needs to be a Property Set for XML itself, and, not
    surprisingly, such a Property Set is now being developed.  But
    that's another story.)
 
In general, interchange architectures for conveying rich semantics
really need both a DTD and a Property Set, because in the general case
it is not possible to generate useful Property Sets from DTDs using
any single algorithm.  In the case of RDF, the limitations of the
algorithm used (to implicitly generate what amounts to a Property Set)
imposes constraints on the complexity of the metadata information that
can be interchanged, and it also imposes inconvenience on developers
of software intended to make use of the information thus conveyed.
 
Here are the implications for RDF of the twin notions of Architectural
Forms and Property Sets as the new basis for RDF:
 
* The structure and complexity of interchange architectures used for
  metadata will no longer necessarily be constrained.  (I, Steve,
  think this flexibility is goodness -- that there should not be any
  constraints except for those which were consciously and voluntarily
  designed into each architecture to meet its own interchange,
  software reuse, learnability, reliability, and other requirements.
  Some of the RDF folks think that the blanket constraints on metadata
  structure now provided by RDF will maximize software reuse (at
  search engines, for example) and learning among Web users,
  programmers, and content developers.  This idea certainly has merit.
  However, I would prefer simply to allow information architects the
  flexibility to maximize the naturalness of the expression of
  metadata.  For me, naturalness is simplicity.  Blanket structural
  constraints usually have the effect of requiring architects to
  employ greater complexity in order to express the same information,
  and this added complexity often decreases learnability and increases
  the difficulty of implementation.)
 
* The complexity of the semantics of supportable metadata will no
  longer necessarily be constrained.  (I, Steve, think this
  flexibility is goodness -- that the full unboundedness of the set of
  possible metadata semantics should be supportable at some level.
  Some of the RDF folks think constraining metadata semantics will
  maximize software reuse (at search engines, for example) and
  learning among Web users, programmers, and content developers.
  These desirable goals seem to me better served by limiting the scope
  of RDF to a certain list of metadata semantics.  Among other things,
  such a list could be an invaluable resource for implementers by
  clarifying which RDF architectures (vocabularies) share which
  equivalent semantics.  A good way to express such a list of
  semantics is to create a property set for RDF.)
 
* Reusable software engines for the semantic processing of instances
  conforming to particular interchange architectures become practical
  and extremely cost-effective.  Each such engine is responsible for
  processing the interchangeable XML form of the information in such a
  way as to generate a "grove" (an object graph whose schema is the
  relevant Property Set) from any XML instance that conforms to the
  architecture.  The fact that the engine is reusable means that it
  can mature and offer reliable semantic services in a variety of
  application contexts.  The cost of developing applications is
  reduced, as is their time-to-market, and their reliability improves.
 
* The design of any given metadata architecture will require more work
  and more careful thought.  (I, Steve, think this is goodness.  The
  W3C people believe that the RDF data model will require slightly
  less work and less thought when a new metadata schema is defined,
  and this reduction in effort is beneficial.  I, by contrast, believe
  that each interchange architecture should maximize the
  appropriateness of its design to the nature of the information it
  models, and that each Property Set should maximize the convenience
  of applications developers.  This way, the semantic processing that
  is common to all applications of a given architecture is supportable
  by a reusable engine.  I believe the distinct formal expressions
  (both schemas/DTDs and Property Sets) that result from the added
  design effort will pay handsome rewards in terms of increased
  reliability of applications and decreased cost of information
  interchange.)
 
* RDF's supporting formalisms and application integration mechanisms
  need not differ from those used to support any other information
  interchange architecture, for any purpose.  Less is more.
 
* The overhead of supporting any given application's use of any
  metadata architecture need never be more than it would have been
  under the current RDF proposal.  It may often turn out to be less,
  because the popularity of certain interchange architectures may
  encourage the development of highly specialized and/or optimized
  engines for supporting them.
 
* Software vendors will be able to demonstrate conformance to the
  semantics of metadata architectures, and purchasers will be able to
  verify that conformance.
 
* "Namespaces" are entirely replaced by the use of architectural
  forms.  (An "architectural form" is an element type definition in a
  DTD used as what RDF now calls a "namespace" or maybe a "vocabulary
  resource".  In ISO jargon, an element that conforms to an
  architectural form is said to be a "client element" of the
  referenced DTD resource or "namespace".)
 
* At least some known problems with the current design of RDF will be
  resolved.  One of these is that when an application expects that the
  value of a particular property is a simple string, but the metadata
  instance received actually has a compound expression using tags from
  another vocabulary, RDF is as yet unclear how the compound
  expression will be manipulated in order to supply a simple string.
  Please see the attached discussion entitled:  "A Known Problem with
  RDF, Resolved by Architectural Forms"
 
 
In the larger context, assuming that architectural forms and property
sets are widely used with XML, there will be the following additional
consequences:
 
* Metadata queries can occur inside any other kind of query, and any
  other kind of query can occur inside a metadata query.  There is
  already a query language, SDQL (Standard Document Query Language)
  that will work for all architectures, and not just metadata
  architectures.  Since it conceptually queries groves, and since
  groves can be generated from any notation for which there is a
  property set, the same query language can be used to provide
  addressing and linking services to non-XML notations.  In other
  words, by making everything appear to conform to the grove/property
  set object model, everything becomes addressable in its own most
  convenient terms, including the things that were only implicit in
  their interchangeable forms.  [Note: The primary significance of
  Eliot Kimber's "PHyLIS" demo during the meeting was that this idea
  of groves and property sets actually works.  In that demo, we saw a
  totally grove-based integration of XML documents and CGM documents,
  with XLink-style extended links providing traversal services between
  the objects in the groves of both kinds of documents.]
 
* XML documents will be able to contain elements that can be processed
  in accordance with several interchange architecture simultaneously.
  Such elements can be said to exhibit the semantic equivalent of
  multiple inheritance.  Information interchange architectures that
  overlap semantically can nonetheless be harmonized in an instance
  that uses all of them, without repeating any information, even if
  they use conflicting element type names and attribute names.  The
  implications of this harmonizability are enormously beneficial for
  E-commerce, among other things.
 
 
********************************************************************************
Some notes:
********************************************************************************
 
What ISO's SGML Extended Facilities calls...
 
  a "base architecture", or
 
  an "(information-interchange-)enabling architecture", or
 
  (when referring to the formal machine-processable model of the
  interchange syntax of a base architecture) a "meta-DTD",
 
  ...is meant to fulfill the same roles and requirements (and more) as
what the RDF draft calls...
 
  a "namespace", or
 
  a "vocabulary", or
 
  a "Scheme".
 
Similarly, what ISO calls...
 
  a "property set"
 
...the RDF draft calls...
 
  a "Scheme".
 
What ISO calls...
 
  a "grove" (acronym: Graph Representation Of property ValuEs),
                      -     -              -           -   -
...the RDF draft calls...
 
  a set of "3-tuples", or
 
  a graph.
 
Within the conceptual frameworks of ISO "groves" and RDF "graphs," the
terms "node", "property", and "arc" appear to have the same meanings
in both the SGML Extended Facilities and in the RDF draft, at least
for purposes of this discussion.
 
RDF has no *general* element subtyping or "semantic load inheritance"
facility, but RDF *does* provide a facility called "namespaces" which
allows a (metadata) element to declare that it should be considered to
convey the same kind of information as an element of a certain type in
one of several popular DTDs (or other schema-like things) for metadata
documents.  The sets of names that are referencable in the schema-like
things that contain the names of the inherited element types are
called "vocabularies".  (A vocabulary need not be a DTD, because there
is no actual architectural subtyping or checking.  In the current
draft of RDF, a tag set is really all that is required.)
 
 
*******************************************************************
"A Known Problem with RDF, Resolved by Architectural Forms"
*******************************************************************
 
At least some known problems with the current design of RDF are
readily resolved by the architectural forms paradigm.  One of these is
that when an application expects that the value of a particular
property is a simple string, but the metadata instance received
actually has a compound expression using tags from another vocabulary,
RDF is as yet unclear how the compound expression will be manipulated
in order to supply a simple string.
 
For example, in the fragment below, if the content of
<RDF:Description> is supposed to be a simple string, what does that
string turn out to be?
 
  <DC:Creator>
    <RDF:Description>
      <IBMPerson:Name>Bob Schloss</IBMPerson:Name>
      <IBMPerson:Email>schloss@watson.ibm.com</IBMPerson:Email>
    </RDF:Description>
  </DC:Creator>
 
Another, perhaps more general, way to put the problem is this: "What
do we do about the content of an element whose semantic is borrowed
from one namespace, when its content's semantics are borrowed from one
or more other namespaces?"
 
In order to understand the several solutions that the "architectural
form paradigm" brings to the above puzzle, it is first necessary to
understand that a single instance of an element can conform to several
architectural forms.  Since an element instance can have only one
generic identifier, it is impractical to use the generic identifier to
specify all the architectures (such as DC, RDF and IBMPerson) to which
that single element conforms.
 
The biggest syntactic difference between architectural forms and
namespaces is in their use of the generic identifier (the "generic
identifier" is the name of the element type that always appears as the
first string found in any element instance's start tag).  Namespaces
use the generic identifier to specify both the architecture and a
particular semantic-laden name within the architecture.  Because there
can be only one generic identifier in any element instance, the syntax
of namespaces effectively prohibits a single element instance from
declaring its author's intention that it be processable in terms of
more than one namespace.  By contrast, the syntax of architectural
forms does not constrain the generic identifier in any way; indeed,
the generic identifier is pretty much ignored, for purposes of
architectural processing.  As far as architectural processing is
concerned, the main purpose of the generic identifier is to provide a
hook for markup minimization.  The generic identifier is relegated to
a role in which it serves as a kind of macro call: it brings in the
default values of all the attributes declared in the DTD, if any, for
that element type, as we'll see shortly.
 
The syntax of architectural forms is actually simpler than the syntax
of namespaces; there is no new syntactic separator (":") required, 
generic identifiers are not split up into fields, and there are no new
constraints on generic identifiers at all.  Each architecture
is referenced by means of an attribute name, and the value of that
attribute is the name of the element type within the architecture to
which the element is claiming both syntactic conformance and semantic
equivalence.  In other words, what in namespace syntax would be
expressed as:
   <DC:Creator>...</DC:Creator>
 
might become:
 
   <foo DC="Creator">...</foo>
  
Therefore, it becomes possible for a single element to claim
conformance with more than one architecture:
 
   <foo DC="Creator" LCCC="Author">...</foo>
 
The following is a digression (but nonetheless a significant
digression) about markup minimization: the above looks pretty verbose,
and, given the reasonable expectation of decentralized control over
metadata architectures, and the increasing need for documents to be
useful in a variety of contexts, verbosity may get a lot worse.  For
example:
  
   <foo DC="Creator" LCCC="Author" DEA="Officer" NAWCAD="TextAuth"
   USGS="Surveyor" Ford="ietmAuthor" Paramount="Creator">...</foo>
  
We can completely conquer this verbosity by using a DTD to cause all
the architectural form attributes to be present and to have the
necessary values by default, for all instances of the element type
"foo":
 
  <!ELEMENT foo - - ( whatever )>
  <!ATTLIST foo
     DC         NAME   "Creator"
     LCCC       NAME   "Author"
     DEA        NAME   "Officer"
     NAWCAD     NAME   "TextAuth"
     USGS       NAME   "Surveyor"
     Ford       NAME   "ietmAuthor"
     Paramount  NAME   "Creator"
  >
 
Now the same element instance can be expressed as:
 
  <foo>...</foo>
 
and still be processed in terms of all those different architectures
in exactly the same way, because all the architectural form attributes
are still implicitly present, and they will be reported by the parser
as if they were explicit.
 
   (Note: XML documents that do not have DTDs cannot take advantage of
   this technique, but they can still take full advantage of the
   architectural form paradigm.  The only difference is that such
   documents must specify, in each element instance, all the
   architectural form attributes needed to process that element in
   terms of all the desired architectures.  As we have just seen,
   doing without a DTD can make documents that use architectural forms
   extremely verbose.  It's exactly like the question of whether
 
   (a) to store a PostScript document with fonts that describe each
       glyph's curve set, and then reference the glyphs whenever they
       are to be used, or
 
   (b) to store each glyph as an explicit set of curves.  
 
   If the document contains only a dozen characters, it may be more
   sensible not to include the font(s) from which they were selected,
   and simply to be explicit about the curves that make up each glyph.
   If the document contains many characters, a huge efficiency
   advantage is gained by including the font and referencing the
   glyphs in the font by means of the characters.  Similarly, if we
   include a DTD with our document, we can, in effect, reference any
   number of attributes and their default values simply by uttering an
   element's generic identifier (<foo>, in the example above).  If we
   have a lot of elements in our document, using a DTD offers a big
   efficiency advantage.  But it's not strictly necessary to use a
   DTD.
 
   It should also be noted that it's not strictly necessary to include
   a DTD with every document, even if you wish to use one.  It's only
   necessary that the recipient of your document also have a copy of
   the same DTD (or something with equivalent ability to drive the
   parsing process) that you intend the document to be used with.
   Again, it's exactly like the situation with fonts in PostScript:
   you don't have to include the font in a PostScript document if you
   know that the recipient's printer has that font already inside it
   (or can load it).)
 
It is also not always necessary to be explicit, even in a DTD, about
all the architectures to which an element conforms, if one
architectural form is already a subtype of another.  For example, we
can take advantage of the fact that, in the NAWCAD architecture, the
"TextAuth" architectural form (remember that "architectural form" ==
"element type") is declared in the NAWCAD architecture as a subtype of
the "Creator" architectural form in the "DC" architecture:
 
    Assuming that in the NAWCAD architecture's DTD:
 
    <!ELEMENT TextAuth - - (whatever)>
    <!ATTLIST TextAuth
       DC  NAME  #FIXED  "Creator"
    >
 
...then every NAWCAD <TextAuth> is by definition also a DC <Creator>.
 
********************************************************************
********************************************************************
**  In the architectural form paradigm, the rule is: An instance  **
**  of an element that claims conformance to any architectural    **
**  form may not violate any of the constraints on the            **
**  architectural form to which it presumably conforms.           **
********************************************************************
********************************************************************
 
   (Note: This simple rule helps to dramatize the differences between
   W3C Namespaces, as presently constituted, and the architectural
   forms paradigm.
 
   * The rule applies to an element's *context*, in that no element
     can appear where its architectural context (the architectural
     forms of its surrounding elements) would not allow it to appear.
     By contrast, the W3C Namespace paradigm does not constrain a
     namespace-referencing element's context to make sense in terms of
     the architecture's (or, rather, the Namespace's) constraints.
 
   * The rule applies to an element's *content,* in that no
     architectural elements can appear inside it unless those
     architectural elements are permitted by the architecture.  Again,
     in the Namespace paradigm, no such constraints are placed on
     namespace-referencing element types.
 
     (Note: the above are two aspects of the same idea: that an
     element's content must be consistent with all of the
     architectural to which it declares conformance.  If you have
     guessed by now that the document element must always conform to
     the architectural forms of the document elements of all the
     architectures used in the document, you guessed correctly.)
 
   * The rule also applies to the element's *attributes,* in that any
     attributes that are required by the architecture must be present
     in the element instance, and if they are not required and not
     present, they are assumed to have their architecturally-defined
     default values and/or #IMPLIED effects on applications of that
     architecture.  If there are attributes present that do not appear
     in the architecture, they are ignored.  The presence of such
     non-architecturally-defined attributes is regarded as implying
     additional constraints, but not as violating any existing
     constraints.  No architecture has the authority to prevent
     additional, non-architectural attributes from appearing on
     elements.  From each architecture's perspective, the attributes
     that are present but not defined by the given architecture are
     invisible.
 
   * Finally, the rule applies to any *other constraints* on element
     content and attributes, even if they cannot necessarily be
     detected by a generic parser.  These are detectable by any
     validating semantic processor engine for that architecture.  For
     example, the HyTime varlink architecture (from which XLink was
     derived) does not allow the number of anchors to exceed 2 unless
     the "manyanch" option is supported and is specified with no value
     or a value greater than "2".  No generic parser can check the
     conformance of an element to this constraint, but a validating
     XLink or varlink architecture processing engine can.  When we
     consider the boundless variety of architectures, we must admit
     that there is probably a boundless variety of such constraints,
     and the best way to handle them is to relegate all
     architecture-specific constraint checking to a re-usable engine
     for that architecture.)
 
Since the above NAWCAD DTD fragment constrains all NAWCAD <TextAuth>
elements to conform to all the constraints and requirements of DC
<creator> elements, it is therefore unnecessary to mention the "DC"
architectural form attribute in the <foo> element, because it is
already there!  By definition, a subtype always conforms to the
constraints and requirements of its supertype(s).
 
In a NAWCAD-oriented application, <foo>'s "NAWCAD" attribute means not
only that our <foo> element can be extracted into a valid NAWCAD
document as a valid <TextAuth> element, but also that it can be
extracted into a valid DC document as a valid <Creator> element.  (In
the jargon of the SGML Extended Facilities, we say that the output of
the parser, conceptually speaking, includes a "grove" -- a parse tree
-- for each of the architectures used by the document.  There is no
requirement that any application actually produce groves; groves are a
concept developed to explain, in abstract terms, the effects of
parsing, processing, and component addressing.)
 
Now, having accumulated the necessary background information, let's go
back to the original question that necessitated all the above
explanation: "What do we do about the content of an element whose
semantic is borrowed from one namespace, when its content's semantics
are borrowed from one or more other namespaces?"  In the architectural
forms paradigm, this question really should become, "What is the
containing element's role in the contained elements' architecture,
and/or what are the contained elements' roles, if any, in the
containing element's architecture?  In the architectural forms
paradigm, any element instance can play several distinct and
unambiguous roles in as many distinct architectures, so it becomes
possible for the contained elements not only to have IBMPerson-defined
semantics, but also RDF semantics, and DC semantics, too.  In fact,
all the elements can have a role to play in every architecture,
provided that when, conceptually speaking, each architectural instance
is extracted from the document, it meets the structural and semantic
constraints imposed by its architecture.
 
There is more than one way to handle the puzzle, but first, let's see
what happens if we don't take advantage of anything of the special
facilities of architectural forms.  In the following example:
 
  <auth DC="Creator">
    <authInfo RDF="Description">
      <persName IBMPerson="Name">Bob Schloss</persName>
      <email IBMPerson="Email">schloss@watson.ibm.com</email>
    </authInfo>
  </auth>
 
the <persName> and <email> elements are not architectural with respect
to the RDF architecture.  From the RDF architecture's perspective,
therefore, the <authInfo> element looks like this:
 
    <Description>Bob Schlossschloss@watson.ibm.com</Description>
 
In other words, the markup of the contained non-architectural elements
has been deleted altogether, leaving Bob Schloss with a very strange
surname, indeed.
 
   (Digression: Why does it work that way?  It's because, in the case
   of mixed content (which is not the situation in our puzzle
   example), the deletion of non-architectural markup still leaves the
   data in pretty good shape.  For example:
 
    <authInfo RDF="Description">Bob Schloss's e-mail address is
    <email>schloss@watson.ibm.com</email>, but you can also use
    <email>rschloss@us.ibm.com</email>.</authInfo>
 
   becomes, from RDF's perspective:
 
    <Description>Bob Schloss's e-mail address is
    schloss@watson.ibm.com, but you can also use
    rschloss@us.ibm.com.</Description>
 
   To handle cases other than mixed content, such as our puzzling
   example, there is no one algorithm that can be automatically
   applied in such a way as to give universally acceptable results.
   In any case, no such algorithms are built into the SGML Extended
   Facilities.)
 
Probably the best way to handle the puzzle of how to make the
<RDF:Description> element get back a simple string is *not* to give it
a simple string, but instead to make the contained elements meaningful
in RDF terms, as well as IBMPerson terms.  For example:
 
<auth DC="Creator">
   <authInfo RDF="Description">
     <persName IBMPerson="Name" RDF="PersonName">Bob Schloss</persName>
     <email IBMPerson="Email" RDF="PersonEmail">schloss@watson.ibm.com</email>
   </authInfo>
</auth>
 
Note that in the above example, I've taken the liberty of equipping
the RDF architecture with the architectural forms <PersonName> and
<PersonEmail>.  Obviously, very few people will have the authority to
do any such thing, so I'm assuming that the creators of RDF
anticipated this particular need and provided these architectural
forms, and all I needed to do was reference them.  I can do that
without affecting the usefulness of my references to the <Name> and
<Email> forms of the IBMPerson architecture; again, in the
architectural forms paradigm, any element instance can conform
explicitly to architectural forms in more than one architecture.
 
Now let's imagine that the RDF architecture provides a <PersonName>
architectural form, but not a <PersonEmail> form.  We're still ok,
because now, from an RDF architectural perspective:
 
      <authInfo RDF="Description">
        <persName IBMPerson="Name" RDF="PersonName">Bob Schloss</persName>
        <email IBMPerson="Email">schloss@watson.ibm.com</email>
      </authInfo>
  
becomes:
 
    <Description><PersonName>Bob Schloss</PersonName>
    schloss@watson.ibm.com</Description>
 
... and this leaves our RDF engine in a position to at least
distinguish between some well-understood data and some raw data, in
mixed content.  At the very least, the boundary between the data
contents of the two contained elements has been preserved.
 
Now let's imagine that there is neither a <PersonName> nor a
<PersonEmail> in the RDF architecture, and that the string
 
    Bob Schlossschloss@watson.ibm.com
 
is unacceptably Delphic as the content of an RDF <Description>.  What
can we do?
 
One way to handle the problem is to ignore, from an RDF perspective,
the data content of all but one of the contained elements.  For
this, we must turn to one of the deeper facilities of the AFDR: the
"ArcIgnD" (architecture ignore data) architectural control attribute,
which allows us to prevent the data content of an element (i.e., the
data consisting of all of its leaves in the parse tree) from being
considered to be part of the document, from the perspective of any
particular architecture.  If, for example, we wanted to ignore the
<persName> element's content for all purposes of RDF processing, we
could say:
 
  <authInfo RDF="Description">
    <persName IBMPerson="Name" RDFIgDat="ArcIgnD">Bob Schloss</persName>
    <email IBMPerson="Email">schloss@watson.ibm.com</email>
  </authInfo>
 
>From an RDF perspective, the above looks like this:
 
  <Description>schloss@watson.ibm.com</Description>
 
To explain the above example, the following is a digression about
"architecture control attributes", and how they are being used in the
above example.
 
The names of all "architectural control attributes" used to control
architectural processing in any document instance are declared in
certain special processing instructions (see "References" below).
There is one processing instruction per architecture.  Each such
processing instruction identifies the architecture, and provides,
among other things, the names of the architectural control attributes
whose values will control the architectural processing of each
element.  The most basic attribute is the "Architectural Form
Attribute", examples of which have appeared in most of the above
examples (as the "DC", "RDF" and "IBMPerson" attributes).  We have
been assuming, in the above examples, that in our document, the RDF
architecture's architectural control attribute's name is "RDF".
However, it could have been any XML name.  Similarly, we have been
assuming that the Dublin Core architecture's architectural form
attribute name is "DC", and the IBMPerson architecture's is
"IBMPerson".
 
Another architectural processing attribute that can be declared in the
same processing instruction is the "Architecture Ignore Data"
attribute.  In our above example, we are assuming that for the RDF
architecture, in this document, the name of the "Architecture Ignore
Data" attribute has been declared in the relevant processing
instruction to be "RDFIgDat".  In the above example, the value
"ArcIgnD" is an ISO-defined string that means "data is always
ignored."
 
   (Note: The other possibilities are:
 
    "nArcIgnD", which means that data is not ignored, and it is an
                error if data occurs where the architecture does not
                allow it, and
 
    "cArcIgnD", which means data is conditionally ignored (data will
                be ignored only when it occurs where the architecture
                does not allow it.)
 
If all this seems rather complex, please remember that the problem of
reliably and smoothly meshing the semantics of multiple namespaces in
a single document is a complex one.  Indeed, it is a problem which the
present simplicity of namespaces is unable to cope with, at least in
the general case.  There is no requirement that anyone use the
"architecture ignore data" attribute, but it's nice that it's there
when it's really needed and nothing less will do.
 
There are other architecture control attributes, and there are still
other things that can be declared in the processing instructions that
define architecture control attributes.
 
(Here ends the digression about architectural control attributes.)
 
 
********************************************************************************
Some references
********************************************************************************
 
Architectural Forms / (Multiple) Inheritance ("Architectural Form
   Definition Requirements" or "AFDR"):
   http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.3.html
   This standard is being amended to provide for XML's use of
   architectural forms by means of processing instructions (which XML
   supports) instead of #NOTATION attributes (which XML does not
   support).  See http://www.ornl.gov/sgml/wg8/document/1957.htm for
   the details of this amendment.
 
Property Sets ("Property Set Definition Requirements" or "PSDR"):
   http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.4.html
 
HyTime Property Set (just a good example of a full-featured property
   set) http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-B.html
 
A Property Set for XLink (in the form of a diagram in PostScript)
   ftp://ftp.techno.com/TechnoTeacher/MISC/xllprops2.ps
 
 
********************************************************************************
Acknowledgement
********************************************************************************
 
As the reader may have guessed, this paper would not have been
possible without the patient substantive help of Robert J. "Bob"
Schloss of IBM's Thomas J. Watson Research Center.
 
 
- 30 -