[This local archive copy is from the official and canonical URL, http://www.ornl.gov/sgml/wg8/document/1998.htm; please refer to the canonical source document if possible.]
Schloss/Newcomb Correspondence on Metadata |
|
SOURCE: |
Steve Newcomb, with comments from Robert Schloss |
PROJECT: |
Metadata Workshop, Paris |
PROJECT EDITOR: |
|
STATUS: |
Summary of e-mail conversations |
ACTION: |
For information |
DATE: |
29 June 1998 |
DISTRIBUTION: |
SC34 and Liaisons |
REFER TO: |
|
REPLY TO: |
Dr. James David Mason |
Date: Fri, 26 Jun 1998 14:44:33 -0500 From: "Steven R. Newcomb" <srn@techno.com> Subject: Schloss/Newcomb correspondence To: metadata@gca.org Message-id: <199806261944.OAA03963@bruno.techno.com> X-UIDL: 68e30dfb18505b0cf56129bd1c4e9a1e Dear Paris Metadata Summit Participants, Once this list was set up (sorry about the long delay), I wrote to Bob and asked whether I should send the fruit of our labor to understand each other to the list. Bob responded: > I am now convinced that there are situations where AFs should be > used and others where namespace prefixes are better. I was hoping > to write all this out to share with you and others, but that has not > happened. And in 30 minutes I disappear for one month of vacation > in Israel with my wife and son. I know that when I come back, I > won't have a lot of time to pursue this until mid-August the > earliest. > If you wish to post our correspondance, that is okay, but you should > probably add a note that says "Bob has done some additional thinking > but was unable to continue the discussion before he left to go out > of town until the end of July" . So, please consider the above notes added. Maybe Bob will find the time to share his further thoughts with us in the relatively near future. -Steve -- Steven R. Newcomb, President, TechnoTeacher, Inc. srn@techno.com http://www.techno.com ftp.techno.com voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137) fax +1 972 994 0087 (at ISOGEN: +1 214 953 3152) 3615 Tanner Lane Richardson, Texas 75082-2618 USA ******************************************************************************** This first installment is what I was trying to articulate at the end of the meeting in Paris. It has been edited according to Bob's instructions. (I'm sparing you all the correspondence that went into preparing this.) There is more correspondence between Bob and me to share with you, but I don't want to send it unless there is some expression of interest in it. The rest of it is pretty much devoted to an explanation of how architectural forms work, in the form of answers to Bob's pointed questions about them. --SRN ******************************************************************************** Some Observations Apropos the Metadata Summit in Paris, May 22, 1998 Steven R. Newcomb As currently drafted, RDF uses a single standard algorithm to convert metadata represented in an XML document (using a vocabulary from a number of declared namespaces) into a queriable resource (tuples/property graphs). The fact that there is a single algorithm for generating, in effect, an API to the metadata objects, imposes many constraints on both the interchange architecture of the metadata (the DTDs or other schema representations of the structure of metadata information in the form in which it is normally interchanged, as XML instances) and also on the API to the information being interchanged by that architecture. The conversion of XML instances into what are effectively APIs to their information content is something that every XML application must do, in one way or another. However, because there is no limit to the variability of the kinds of information conveyed in XML documents, there can never be a single algorithm that will convert instances conforming to every interchange architecture (DTD) into the most useful and minimal API to the meaning of instances conforming to that architecture. As an example, consider the XLink interchange architecture. Any really useful API to the meaning of an XLink would be able to provide, among other things, reports as to the anchor status of information nodes that might not even be in the same document as the XLinks. Therefore, a special API, written to provide useful access to the meaning of XLinks, is needed. It is hard to imagine how any algorithm could generate such an API, given only the schema of the XLink interchange architecture or an XLink element type definition. (Masatomo Goto of Fujitsu Labs developed a Property Set for XLink in order to build the XLink engine he demonstrated at SGML Europe '98. A diagram of that Property Set exists; see "References", below.) At this time, RDF's designers are working to the requirements of a few popular metadata architectures. It is expected that these metadata architectures can be constrained in such a way that a single algorithm can generate a useful API directly from the architectures. Although there is nothing fundamentally wrong with the current RDF approach, given its limited requirements, the current RDF approach is profoundly suboptimal when considered in the larger context. To understand the larger context, we must recognize that practically everything that will be done with XML, including all of its draft and proposed semantic enhancements (including XLink and XPointers), is best realized as a pair of distinct formal expressions: (1) the document type definition (DTD) or other schema that is the formal expression of the *interchange syntax* of the architecture, and (2) the Property Set or other schema that is the formal expression of the *abstract API to the information conveyed by instances that conform to the architecture*. (If you can imagine having the ability to add a module to the DOM for each interchange architecture, so that there are now additional objects that reflect the semantic phenomena expressible in that interchange architecture, you can understand what a Property Set is. Unfortunately, the current draft of the DOM is not set up to support this, but it could be, while still meeting or exceeding all of its current objectives and requirements. First of all, there needs to be a Property Set for XML itself, and, not surprisingly, such a Property Set is now being developed. But that's another story.) In general, interchange architectures for conveying rich semantics really need both a DTD and a Property Set, because in the general case it is not possible to generate useful Property Sets from DTDs using any single algorithm. In the case of RDF, the limitations of the algorithm used (to implicitly generate what amounts to a Property Set) imposes constraints on the complexity of the metadata information that can be interchanged, and it also imposes inconvenience on developers of software intended to make use of the information thus conveyed. Here are the implications for RDF of the twin notions of Architectural Forms and Property Sets as the new basis for RDF: * The structure and complexity of interchange architectures used for metadata will no longer necessarily be constrained. (I, Steve, think this flexibility is goodness -- that there should not be any constraints except for those which were consciously and voluntarily designed into each architecture to meet its own interchange, software reuse, learnability, reliability, and other requirements. Some of the RDF folks think that the blanket constraints on metadata structure now provided by RDF will maximize software reuse (at search engines, for example) and learning among Web users, programmers, and content developers. This idea certainly has merit. However, I would prefer simply to allow information architects the flexibility to maximize the naturalness of the expression of metadata. For me, naturalness is simplicity. Blanket structural constraints usually have the effect of requiring architects to employ greater complexity in order to express the same information, and this added complexity often decreases learnability and increases the difficulty of implementation.) * The complexity of the semantics of supportable metadata will no longer necessarily be constrained. (I, Steve, think this flexibility is goodness -- that the full unboundedness of the set of possible metadata semantics should be supportable at some level. Some of the RDF folks think constraining metadata semantics will maximize software reuse (at search engines, for example) and learning among Web users, programmers, and content developers. These desirable goals seem to me better served by limiting the scope of RDF to a certain list of metadata semantics. Among other things, such a list could be an invaluable resource for implementers by clarifying which RDF architectures (vocabularies) share which equivalent semantics. A good way to express such a list of semantics is to create a property set for RDF.) * Reusable software engines for the semantic processing of instances conforming to particular interchange architectures become practical and extremely cost-effective. Each such engine is responsible for processing the interchangeable XML form of the information in such a way as to generate a "grove" (an object graph whose schema is the relevant Property Set) from any XML instance that conforms to the architecture. The fact that the engine is reusable means that it can mature and offer reliable semantic services in a variety of application contexts. The cost of developing applications is reduced, as is their time-to-market, and their reliability improves. * The design of any given metadata architecture will require more work and more careful thought. (I, Steve, think this is goodness. The W3C people believe that the RDF data model will require slightly less work and less thought when a new metadata schema is defined, and this reduction in effort is beneficial. I, by contrast, believe that each interchange architecture should maximize the appropriateness of its design to the nature of the information it models, and that each Property Set should maximize the convenience of applications developers. This way, the semantic processing that is common to all applications of a given architecture is supportable by a reusable engine. I believe the distinct formal expressions (both schemas/DTDs and Property Sets) that result from the added design effort will pay handsome rewards in terms of increased reliability of applications and decreased cost of information interchange.) * RDF's supporting formalisms and application integration mechanisms need not differ from those used to support any other information interchange architecture, for any purpose. Less is more. * The overhead of supporting any given application's use of any metadata architecture need never be more than it would have been under the current RDF proposal. It may often turn out to be less, because the popularity of certain interchange architectures may encourage the development of highly specialized and/or optimized engines for supporting them. * Software vendors will be able to demonstrate conformance to the semantics of metadata architectures, and purchasers will be able to verify that conformance. * "Namespaces" are entirely replaced by the use of architectural forms. (An "architectural form" is an element type definition in a DTD used as what RDF now calls a "namespace" or maybe a "vocabulary resource". In ISO jargon, an element that conforms to an architectural form is said to be a "client element" of the referenced DTD resource or "namespace".) * At least some known problems with the current design of RDF will be resolved. One of these is that when an application expects that the value of a particular property is a simple string, but the metadata instance received actually has a compound expression using tags from another vocabulary, RDF is as yet unclear how the compound expression will be manipulated in order to supply a simple string. Please see the attached discussion entitled: "A Known Problem with RDF, Resolved by Architectural Forms" In the larger context, assuming that architectural forms and property sets are widely used with XML, there will be the following additional consequences: * Metadata queries can occur inside any other kind of query, and any other kind of query can occur inside a metadata query. There is already a query language, SDQL (Standard Document Query Language) that will work for all architectures, and not just metadata architectures. Since it conceptually queries groves, and since groves can be generated from any notation for which there is a property set, the same query language can be used to provide addressing and linking services to non-XML notations. In other words, by making everything appear to conform to the grove/property set object model, everything becomes addressable in its own most convenient terms, including the things that were only implicit in their interchangeable forms. [Note: The primary significance of Eliot Kimber's "PHyLIS" demo during the meeting was that this idea of groves and property sets actually works. In that demo, we saw a totally grove-based integration of XML documents and CGM documents, with XLink-style extended links providing traversal services between the objects in the groves of both kinds of documents.] * XML documents will be able to contain elements that can be processed in accordance with several interchange architecture simultaneously. Such elements can be said to exhibit the semantic equivalent of multiple inheritance. Information interchange architectures that overlap semantically can nonetheless be harmonized in an instance that uses all of them, without repeating any information, even if they use conflicting element type names and attribute names. The implications of this harmonizability are enormously beneficial for E-commerce, among other things. ******************************************************************************** Some notes: ******************************************************************************** What ISO's SGML Extended Facilities calls... a "base architecture", or an "(information-interchange-)enabling architecture", or (when referring to the formal machine-processable model of the interchange syntax of a base architecture) a "meta-DTD", ...is meant to fulfill the same roles and requirements (and more) as what the RDF draft calls... a "namespace", or a "vocabulary", or a "Scheme". Similarly, what ISO calls... a "property set" ...the RDF draft calls... a "Scheme". What ISO calls... a "grove" (acronym: Graph Representation Of property ValuEs), - - - - - ...the RDF draft calls... a set of "3-tuples", or a graph. Within the conceptual frameworks of ISO "groves" and RDF "graphs," the terms "node", "property", and "arc" appear to have the same meanings in both the SGML Extended Facilities and in the RDF draft, at least for purposes of this discussion. RDF has no *general* element subtyping or "semantic load inheritance" facility, but RDF *does* provide a facility called "namespaces" which allows a (metadata) element to declare that it should be considered to convey the same kind of information as an element of a certain type in one of several popular DTDs (or other schema-like things) for metadata documents. The sets of names that are referencable in the schema-like things that contain the names of the inherited element types are called "vocabularies". (A vocabulary need not be a DTD, because there is no actual architectural subtyping or checking. In the current draft of RDF, a tag set is really all that is required.) ******************************************************************* "A Known Problem with RDF, Resolved by Architectural Forms" ******************************************************************* At least some known problems with the current design of RDF are readily resolved by the architectural forms paradigm. One of these is that when an application expects that the value of a particular property is a simple string, but the metadata instance received actually has a compound expression using tags from another vocabulary, RDF is as yet unclear how the compound expression will be manipulated in order to supply a simple string. For example, in the fragment below, if the content of <RDF:Description> is supposed to be a simple string, what does that string turn out to be? <DC:Creator> <RDF:Description> <IBMPerson:Name>Bob Schloss</IBMPerson:Name> <IBMPerson:Email>schloss@watson.ibm.com</IBMPerson:Email> </RDF:Description> </DC:Creator> Another, perhaps more general, way to put the problem is this: "What do we do about the content of an element whose semantic is borrowed from one namespace, when its content's semantics are borrowed from one or more other namespaces?" In order to understand the several solutions that the "architectural form paradigm" brings to the above puzzle, it is first necessary to understand that a single instance of an element can conform to several architectural forms. Since an element instance can have only one generic identifier, it is impractical to use the generic identifier to specify all the architectures (such as DC, RDF and IBMPerson) to which that single element conforms. The biggest syntactic difference between architectural forms and namespaces is in their use of the generic identifier (the "generic identifier" is the name of the element type that always appears as the first string found in any element instance's start tag). Namespaces use the generic identifier to specify both the architecture and a particular semantic-laden name within the architecture. Because there can be only one generic identifier in any element instance, the syntax of namespaces effectively prohibits a single element instance from declaring its author's intention that it be processable in terms of more than one namespace. By contrast, the syntax of architectural forms does not constrain the generic identifier in any way; indeed, the generic identifier is pretty much ignored, for purposes of architectural processing. As far as architectural processing is concerned, the main purpose of the generic identifier is to provide a hook for markup minimization. The generic identifier is relegated to a role in which it serves as a kind of macro call: it brings in the default values of all the attributes declared in the DTD, if any, for that element type, as we'll see shortly. The syntax of architectural forms is actually simpler than the syntax of namespaces; there is no new syntactic separator (":") required, generic identifiers are not split up into fields, and there are no new constraints on generic identifiers at all. Each architecture is referenced by means of an attribute name, and the value of that attribute is the name of the element type within the architecture to which the element is claiming both syntactic conformance and semantic equivalence. In other words, what in namespace syntax would be expressed as: <DC:Creator>...</DC:Creator> might become: <foo DC="Creator">...</foo> Therefore, it becomes possible for a single element to claim conformance with more than one architecture: <foo DC="Creator" LCCC="Author">...</foo> The following is a digression (but nonetheless a significant digression) about markup minimization: the above looks pretty verbose, and, given the reasonable expectation of decentralized control over metadata architectures, and the increasing need for documents to be useful in a variety of contexts, verbosity may get a lot worse. For example: <foo DC="Creator" LCCC="Author" DEA="Officer" NAWCAD="TextAuth" USGS="Surveyor" Ford="ietmAuthor" Paramount="Creator">...</foo> We can completely conquer this verbosity by using a DTD to cause all the architectural form attributes to be present and to have the necessary values by default, for all instances of the element type "foo": <!ELEMENT foo - - ( whatever )> <!ATTLIST foo DC NAME "Creator" LCCC NAME "Author" DEA NAME "Officer" NAWCAD NAME "TextAuth" USGS NAME "Surveyor" Ford NAME "ietmAuthor" Paramount NAME "Creator" > Now the same element instance can be expressed as: <foo>...</foo> and still be processed in terms of all those different architectures in exactly the same way, because all the architectural form attributes are still implicitly present, and they will be reported by the parser as if they were explicit. (Note: XML documents that do not have DTDs cannot take advantage of this technique, but they can still take full advantage of the architectural form paradigm. The only difference is that such documents must specify, in each element instance, all the architectural form attributes needed to process that element in terms of all the desired architectures. As we have just seen, doing without a DTD can make documents that use architectural forms extremely verbose. It's exactly like the question of whether (a) to store a PostScript document with fonts that describe each glyph's curve set, and then reference the glyphs whenever they are to be used, or (b) to store each glyph as an explicit set of curves. If the document contains only a dozen characters, it may be more sensible not to include the font(s) from which they were selected, and simply to be explicit about the curves that make up each glyph. If the document contains many characters, a huge efficiency advantage is gained by including the font and referencing the glyphs in the font by means of the characters. Similarly, if we include a DTD with our document, we can, in effect, reference any number of attributes and their default values simply by uttering an element's generic identifier (<foo>, in the example above). If we have a lot of elements in our document, using a DTD offers a big efficiency advantage. But it's not strictly necessary to use a DTD. It should also be noted that it's not strictly necessary to include a DTD with every document, even if you wish to use one. It's only necessary that the recipient of your document also have a copy of the same DTD (or something with equivalent ability to drive the parsing process) that you intend the document to be used with. Again, it's exactly like the situation with fonts in PostScript: you don't have to include the font in a PostScript document if you know that the recipient's printer has that font already inside it (or can load it).) It is also not always necessary to be explicit, even in a DTD, about all the architectures to which an element conforms, if one architectural form is already a subtype of another. For example, we can take advantage of the fact that, in the NAWCAD architecture, the "TextAuth" architectural form (remember that "architectural form" == "element type") is declared in the NAWCAD architecture as a subtype of the "Creator" architectural form in the "DC" architecture: Assuming that in the NAWCAD architecture's DTD: <!ELEMENT TextAuth - - (whatever)> <!ATTLIST TextAuth DC NAME #FIXED "Creator" > ...then every NAWCAD <TextAuth> is by definition also a DC <Creator>. ******************************************************************** ******************************************************************** ** In the architectural form paradigm, the rule is: An instance ** ** of an element that claims conformance to any architectural ** ** form may not violate any of the constraints on the ** ** architectural form to which it presumably conforms. ** ******************************************************************** ******************************************************************** (Note: This simple rule helps to dramatize the differences between W3C Namespaces, as presently constituted, and the architectural forms paradigm. * The rule applies to an element's *context*, in that no element can appear where its architectural context (the architectural forms of its surrounding elements) would not allow it to appear. By contrast, the W3C Namespace paradigm does not constrain a namespace-referencing element's context to make sense in terms of the architecture's (or, rather, the Namespace's) constraints. * The rule applies to an element's *content,* in that no architectural elements can appear inside it unless those architectural elements are permitted by the architecture. Again, in the Namespace paradigm, no such constraints are placed on namespace-referencing element types. (Note: the above are two aspects of the same idea: that an element's content must be consistent with all of the architectural to which it declares conformance. If you have guessed by now that the document element must always conform to the architectural forms of the document elements of all the architectures used in the document, you guessed correctly.) * The rule also applies to the element's *attributes,* in that any attributes that are required by the architecture must be present in the element instance, and if they are not required and not present, they are assumed to have their architecturally-defined default values and/or #IMPLIED effects on applications of that architecture. If there are attributes present that do not appear in the architecture, they are ignored. The presence of such non-architecturally-defined attributes is regarded as implying additional constraints, but not as violating any existing constraints. No architecture has the authority to prevent additional, non-architectural attributes from appearing on elements. From each architecture's perspective, the attributes that are present but not defined by the given architecture are invisible. * Finally, the rule applies to any *other constraints* on element content and attributes, even if they cannot necessarily be detected by a generic parser. These are detectable by any validating semantic processor engine for that architecture. For example, the HyTime varlink architecture (from which XLink was derived) does not allow the number of anchors to exceed 2 unless the "manyanch" option is supported and is specified with no value or a value greater than "2". No generic parser can check the conformance of an element to this constraint, but a validating XLink or varlink architecture processing engine can. When we consider the boundless variety of architectures, we must admit that there is probably a boundless variety of such constraints, and the best way to handle them is to relegate all architecture-specific constraint checking to a re-usable engine for that architecture.) Since the above NAWCAD DTD fragment constrains all NAWCAD <TextAuth> elements to conform to all the constraints and requirements of DC <creator> elements, it is therefore unnecessary to mention the "DC" architectural form attribute in the <foo> element, because it is already there! By definition, a subtype always conforms to the constraints and requirements of its supertype(s). In a NAWCAD-oriented application, <foo>'s "NAWCAD" attribute means not only that our <foo> element can be extracted into a valid NAWCAD document as a valid <TextAuth> element, but also that it can be extracted into a valid DC document as a valid <Creator> element. (In the jargon of the SGML Extended Facilities, we say that the output of the parser, conceptually speaking, includes a "grove" -- a parse tree -- for each of the architectures used by the document. There is no requirement that any application actually produce groves; groves are a concept developed to explain, in abstract terms, the effects of parsing, processing, and component addressing.) Now, having accumulated the necessary background information, let's go back to the original question that necessitated all the above explanation: "What do we do about the content of an element whose semantic is borrowed from one namespace, when its content's semantics are borrowed from one or more other namespaces?" In the architectural forms paradigm, this question really should become, "What is the containing element's role in the contained elements' architecture, and/or what are the contained elements' roles, if any, in the containing element's architecture? In the architectural forms paradigm, any element instance can play several distinct and unambiguous roles in as many distinct architectures, so it becomes possible for the contained elements not only to have IBMPerson-defined semantics, but also RDF semantics, and DC semantics, too. In fact, all the elements can have a role to play in every architecture, provided that when, conceptually speaking, each architectural instance is extracted from the document, it meets the structural and semantic constraints imposed by its architecture. There is more than one way to handle the puzzle, but first, let's see what happens if we don't take advantage of anything of the special facilities of architectural forms. In the following example: <auth DC="Creator"> <authInfo RDF="Description"> <persName IBMPerson="Name">Bob Schloss</persName> <email IBMPerson="Email">schloss@watson.ibm.com</email> </authInfo> </auth> the <persName> and <email> elements are not architectural with respect to the RDF architecture. From the RDF architecture's perspective, therefore, the <authInfo> element looks like this: <Description>Bob Schlossschloss@watson.ibm.com</Description> In other words, the markup of the contained non-architectural elements has been deleted altogether, leaving Bob Schloss with a very strange surname, indeed. (Digression: Why does it work that way? It's because, in the case of mixed content (which is not the situation in our puzzle example), the deletion of non-architectural markup still leaves the data in pretty good shape. For example: <authInfo RDF="Description">Bob Schloss's e-mail address is <email>schloss@watson.ibm.com</email>, but you can also use <email>rschloss@us.ibm.com</email>.</authInfo> becomes, from RDF's perspective: <Description>Bob Schloss's e-mail address is schloss@watson.ibm.com, but you can also use rschloss@us.ibm.com.</Description> To handle cases other than mixed content, such as our puzzling example, there is no one algorithm that can be automatically applied in such a way as to give universally acceptable results. In any case, no such algorithms are built into the SGML Extended Facilities.) Probably the best way to handle the puzzle of how to make the <RDF:Description> element get back a simple string is *not* to give it a simple string, but instead to make the contained elements meaningful in RDF terms, as well as IBMPerson terms. For example: <auth DC="Creator"> <authInfo RDF="Description"> <persName IBMPerson="Name" RDF="PersonName">Bob Schloss</persName> <email IBMPerson="Email" RDF="PersonEmail">schloss@watson.ibm.com</email> </authInfo> </auth> Note that in the above example, I've taken the liberty of equipping the RDF architecture with the architectural forms <PersonName> and <PersonEmail>. Obviously, very few people will have the authority to do any such thing, so I'm assuming that the creators of RDF anticipated this particular need and provided these architectural forms, and all I needed to do was reference them. I can do that without affecting the usefulness of my references to the <Name> and <Email> forms of the IBMPerson architecture; again, in the architectural forms paradigm, any element instance can conform explicitly to architectural forms in more than one architecture. Now let's imagine that the RDF architecture provides a <PersonName> architectural form, but not a <PersonEmail> form. We're still ok, because now, from an RDF architectural perspective: <authInfo RDF="Description"> <persName IBMPerson="Name" RDF="PersonName">Bob Schloss</persName> <email IBMPerson="Email">schloss@watson.ibm.com</email> </authInfo> becomes: <Description><PersonName>Bob Schloss</PersonName> schloss@watson.ibm.com</Description> ... and this leaves our RDF engine in a position to at least distinguish between some well-understood data and some raw data, in mixed content. At the very least, the boundary between the data contents of the two contained elements has been preserved. Now let's imagine that there is neither a <PersonName> nor a <PersonEmail> in the RDF architecture, and that the string Bob Schlossschloss@watson.ibm.com is unacceptably Delphic as the content of an RDF <Description>. What can we do? One way to handle the problem is to ignore, from an RDF perspective, the data content of all but one of the contained elements. For this, we must turn to one of the deeper facilities of the AFDR: the "ArcIgnD" (architecture ignore data) architectural control attribute, which allows us to prevent the data content of an element (i.e., the data consisting of all of its leaves in the parse tree) from being considered to be part of the document, from the perspective of any particular architecture. If, for example, we wanted to ignore the <persName> element's content for all purposes of RDF processing, we could say: <authInfo RDF="Description"> <persName IBMPerson="Name" RDFIgDat="ArcIgnD">Bob Schloss</persName> <email IBMPerson="Email">schloss@watson.ibm.com</email> </authInfo> >From an RDF perspective, the above looks like this: <Description>schloss@watson.ibm.com</Description> To explain the above example, the following is a digression about "architecture control attributes", and how they are being used in the above example. The names of all "architectural control attributes" used to control architectural processing in any document instance are declared in certain special processing instructions (see "References" below). There is one processing instruction per architecture. Each such processing instruction identifies the architecture, and provides, among other things, the names of the architectural control attributes whose values will control the architectural processing of each element. The most basic attribute is the "Architectural Form Attribute", examples of which have appeared in most of the above examples (as the "DC", "RDF" and "IBMPerson" attributes). We have been assuming, in the above examples, that in our document, the RDF architecture's architectural control attribute's name is "RDF". However, it could have been any XML name. Similarly, we have been assuming that the Dublin Core architecture's architectural form attribute name is "DC", and the IBMPerson architecture's is "IBMPerson". Another architectural processing attribute that can be declared in the same processing instruction is the "Architecture Ignore Data" attribute. In our above example, we are assuming that for the RDF architecture, in this document, the name of the "Architecture Ignore Data" attribute has been declared in the relevant processing instruction to be "RDFIgDat". In the above example, the value "ArcIgnD" is an ISO-defined string that means "data is always ignored." (Note: The other possibilities are: "nArcIgnD", which means that data is not ignored, and it is an error if data occurs where the architecture does not allow it, and "cArcIgnD", which means data is conditionally ignored (data will be ignored only when it occurs where the architecture does not allow it.) If all this seems rather complex, please remember that the problem of reliably and smoothly meshing the semantics of multiple namespaces in a single document is a complex one. Indeed, it is a problem which the present simplicity of namespaces is unable to cope with, at least in the general case. There is no requirement that anyone use the "architecture ignore data" attribute, but it's nice that it's there when it's really needed and nothing less will do. There are other architecture control attributes, and there are still other things that can be declared in the processing instructions that define architecture control attributes. (Here ends the digression about architectural control attributes.) ******************************************************************************** Some references ******************************************************************************** Architectural Forms / (Multiple) Inheritance ("Architectural Form Definition Requirements" or "AFDR"): http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.3.html This standard is being amended to provide for XML's use of architectural forms by means of processing instructions (which XML supports) instead of #NOTATION attributes (which XML does not support). See http://www.ornl.gov/sgml/wg8/document/1957.htm for the details of this amendment. Property Sets ("Property Set Definition Requirements" or "PSDR"): http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.4.html HyTime Property Set (just a good example of a full-featured property set) http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-B.html A Property Set for XLink (in the form of a diagram in PostScript) ftp://ftp.techno.com/TechnoTeacher/MISC/xllprops2.ps ******************************************************************************** Acknowledgement ******************************************************************************** As the reader may have guessed, this paper would not have been possible without the patient substantive help of Robert J. "Bob" Schloss of IBM's Thomas J. Watson Research Center. - 30 -