Annex C

Annex C
(informative)

Extensible Markup Language (XML)

C.1 Introduction

The purpose of this Annex is to give a short introduction to XML and the rationale behind the design of this standard. This standard uses XML in a way inspired by OMG's XML Metadata Interchange (XMI) specification.

XML is an open, platform independent and vendor independent standard. It supports the international character set standards of ISO 10646 and Unicode. The XML standard is programming language neutral and API-neutral. A range of XML APIs are available, giving the programmer a choice of access methods to create, view, and integrate XML information. The cost of entry for XML information providers is low. XML's tag structure and textual syntax make it as easy to read as HTML, and it is clearly better for conveying structured information. The cost of entry for automatic XML document producers and consumers is also low. A growing set of tools is available for XML development.

XMI is an XML based exchange standard for exchange of object-oriented metadata models. The purpose of XMI is to allow exchange UML models between modelling tools in a vendor neutral way. It is based on OMG's Meta Object Facility and on CORBA data types. XMI can in theory be used to exchange data based on UML models, but are not primarily designed for this purpose. This standard is therefore designed based on the principles of XMI, but simplified and adapted to suit the needs of this family of standards. And thus more specialised to allow exchange of data based on UML directly.

Clause C.2 and C.3 give introductions to XML and XMI, respectively. Clause C.4 outlines some of the differences between XMI and this standard and clause C.5 gives some references for further reading.

C.2 Extensible Markup Language

C.2.1 General

The Extensible Markup Language [XML] is a subset of SGML ISO 8879:1996. XML has been designed for ease of implementation and for interoperability with both SGML and HTML. It is a new format designed to bring structured information to the Web. It is in effect a Web based language for electronic data interchange. XML is an open technology of the World Wide Web Consortium [W3C].

XML defines a class of data objects called XML documents. A software module called an XML processor is used to read XML documents and provide access to their content and structure. XML documents contain structures of matched tag pairs (also called markup) containing nested tags and data. In combination with its advanced linking capabilities, XML can encode a wide variety of information structures. The rules which specify how the tags are structured are called a Document Type Declaration or DTD. XML is in that respect similar to UML and has a declarative model (DTD) and an instance model (XML document).

XML documents comes in two flavours, i.e. well-formed XML documents and valid XML documents. An XML document is well-formed if it conforms to the XML standard, and if it contains exactly one root element and any number of content elements where the elements, delimited by start- and end-tags, nest properly within each other. A XML document is valid if it is well-formed and if it conforms to its DTD. An XML processor can be non-validating or validating. That means that a non-validating XML processor only checks if the XML document is well-formed. A validating XML processor checks whether the XML document conforms with its DTD as well.

An XML document can consist of one or more storage units which makes out the document's physical structure. These storage units are called entities. Each XML entity has a name and a content. The general idea is that if you quote the name of the entity in the logical structure of the XML document you receive the corresponding content. This quoting mechanism is called an entity reference. There are two types of entities: text and binary. A text entity contains text data that is considered to form part of the XML document. An XML document is considered to be a text entity. A binary entity is basically anything that isn't to be treated as though it is XML-encoded. Each binary entity needs to have a notation associated with it to indicate the type of binary encoding used, for example GIF, plain-text or PDF.

C.2.2 XML element

The logical structure of an XML document comprises of properly nested XML elements and entity references. An XML element may have attributes and a content. An element always has a start tag which may include the element's attributes, a content and an end tag. The content of an element is called a content model. This can be empty, a sequence of elements, one of an alternative list of elements, repetitions of elements, plain text or mixed elements and text data. An example of a simple XML element called Road with no attributes and plain text as content is:

<Road>Route 66</Road>

Here we see that "<Road>" is the start tag, "Route 66" is the content of the element, and "</Road>" is the closing tag of the element. A more complex XML element is an element with attributes and two elements as content:

<Person id="convenor1">

<FirstName>Olaf</FirstName>

<LastName>Østensen</LastName>

</Person>

Here we have an XML element called Person with an attribute id with value "convenor1". The content of the Person element is two XML elements called FirstName and LastName which has plain text as their content model. The entity reference "Ø" is referring to an XML text entity which content is the Latin 1 character "Ø".

An XML element can refer to another element within the XML document or to external resources by using special purpose XML attributes. These XML elements are called linking elements. Linking elements point to a target resource through a Universal Resource Identificator (URI) reference. Declaring XML elements with these special purpose XML attributes indicate their behaviour for the XML processor. See the next clause for an overview of some of the special purpose XML attributes. Here is an example of an XML element that points to another XML document:

<Reference xml:link="simple" href="http://www.example.org/AnnexA.xml">See Annex A</Reference>

Here the special purpose attribute xml:link indicates that this is a linking element and the value of the attribute href is a URI pointing to another XML document.

C.2.3 Document Type Declaration (DTD)

A Document Type Declaration (DTD) declares the valid XML elements, their structure, and the XML entities that can be used by a class of XML documents. An XML document can have an external DTD subset defined in a separate file and/or an internal DTD subset defined as a part of the header information of the XML document. If an XML document contains both an external subset and an internal subset, the internal subset is considered to occur before the external subset. In the following DTD declarations are given with examples on how instances in the XML document will look like.

An XML element is declared using the special DTD start tag "<!ELEMENT":

<!ELEMENT Road (#PCDATA)>

<Road>Route 66</Road>

Here we define an XML element named Road with text as the content model. This is indicated by the XML reserved keyword "#PCDATA", which is short for parsed character data. The content model can be empty, a sequence or a choice of specific child elements, any combination of elements, or a mixture of text and specific child elements. A multiplicity operator can be used to specify the allowed occurrences of the child elements. The multiplicity operators are (?) for zero-or-one, (+) for one-or-more and (*) for zero-or-more occurrences. Absence of the multiplicity operator means that the child element must appear exactly once. Here is an example on the use of multiplicity operators:

<!ELEMENT parent ( c1, c2?, c3+, c4* )>

<parent>

<c1> ... </c1>

<c3> ... </c3>

<c3> ... </c3>

<c4> ... </c4>

</parent>

The parent XML element shall contain the four child XML elements c1, c2, c3 and c4, in that particular order. The element c1 must always occur, c2 is optional, there can be one-or-more c3 elements and zero-or-more c4 elements. The example shows one c1, no c2, two c3 and one c4 elements.

An XML element with a choice of child elements is declared as follows:

<!ELEMENT car_part (door | wheel | engine)>

<car_part>

<wheel> ... </wheel>

</car_part>

Here a car_part element may contain a door, a wheel or an engine, but not more than one child element.

A mixed content model allows a mixture of text and specific child elements:

<!ELEMENT mp (#PCDATA | Road)*>

<mp>Here we can mix text and Road elements. <Road>E6</Road> See!</mp>

The ANY keyword specifies that the content model of an XML element can be anything:

<!ELEMENT p ANY>

<p>Anything goes <Road>E6</Road>here<tag/>!</p>

If an XML element is declared to be empty it shall not have a closing tag. This is indicated by a slash at the end of the start tag.

<!ELEMENT tag EMTPY>

<tag/>

An XML element can have attributes. The attributes are declared in an XML attribute list statement. XML has three groups of attribute types: character data, tokenised types and enumerated types. An attribute can either be mandatory or optional indicated by the #REQUIRED and #IMPLIED statements in the attribute list declaration. There is also possible to give default values for attributes. An attribute is declared in an attribute list construct:

<!ELEMENT point EMPTY>

<!ATTLIST point

id ID #REQUIRED

dim (oneD | twoD | threeD) #IMPLIED "twoD"

x CDATA #REQUIRED

y CDATA #IMPLIED

z CDATA #IMPLIED>

Here we have defined an XML element named point. The point has an attribute list of five attributes. The attribute called id has a tokenised type ID and it is required. The dim attribute has an enumerated type that takes one out of three valid values. This attribute is implied, but it has a default value of "twoD". The x, y and z attributes all are of type "CDATA", which is short for character data. Since XML does not have any data types for numeric values the coordinate values must be converted to strings. Notice that only x is required. Two instance of this XML element are:

For the first instance with id equal "i01" we can use the implied default value of dim to indicate that this point is two dimensional do not have to give this explicit. The second instance has all attributes filled in.

The tokenised attribute types specified in XML are:

XML Type

Semantics

ID

An identifier for the element that, if specified, must be unique within the XML document. The value of the identifier must always start with a letter, '_' or ':'. An XML element can only have one attribute of type ID.

IDREF

A reference to an XML element in the XML document. The value must correspond to an attribute value of type ID in an existing XML element.

IDREFS

A reference to one or more XML elements. The values must be separated by spaces and must correspond to existing XML element ID's.

ENTITY

A reference to an external entity. The value must be a legal entity name.

ENTITIES

A reference to any number of entity names, where the entity names is separated by spaces.

NMTOKEN

A NMTOKEN (Name Token) is any mixture of characters.

NMTOKENS

Any number of NMTOKENs separated by spaces.

Some attributes are specific for XML and have reserved names and well-defined semantics. The xml:lang attribute can be used to indicate the language used in XML elements and xml:space can be used to indicate how to handle whitespace within elements.

Attribute name

XML Type

Semantics

xml:lang

NMTOKEN

A special attribute that may be inserted in documents to specify the language used in the contents and attribute values of any element in a XML document. But it must be defined in the attribute list specification of the actual element.

xml:space

(default | preserve)

A special attribute that signals an intention that in that element, white space should be preserved by applications.

C.2.4 Linking element

Linking elements are recognised based on the use of a designated attribute named xml:link and a set of accompanying attributes. This is described in the XML XLink specification [XLink]. A link is an explicit relationship between two or more data objects or portions of data objects. The content of the linking element is called the local resource and the target of the link is called the remote resource. A remote resource is identified by a text string called a locator. A locator value may contain either a Uniform Resource Identifier (URI) or a fragment identifier, or both. The syntax of a locator is first the URI, followed by a connector ("#" or "|") and a fragment identifier. The URI describes a remote resource, and the fragment identifier describes a sub-resource within that resource. A fragment identifier pointing into a XML document must be an XPointer [XPointer]. If the connector is "#", this signals that the remote resource is to be fetched as a whole, and that the XPointer processing to extract the sub-resource is to be performed on the client. If the connector is "|", no intent is signalled as to what processing model is to be used for accessing the designated resource.

The following information can be associated with a link and its resources: One or more locators to identify the remote resources participating in the link (a locator is required for each remote resource), the semantics of the link, the semantics of the remote resources and the semantics of the local resource. Example of an linking element is:

<link xml:link="simple" href="http://www.example.org/data.xml|id(i005)" >This is the local resource</link>

Here we have a linking element called link. It has two attributes xml:link and href. xml:link states that this is a simple linking element, whereas href holds the locator identifying the remote resource. The locator consist of an URI which is "http://www.example.org/data.xml", a connector "|" and a fragment identifier "id(i005)" in XPointer syntax. We could also have used "i005" directly, it is defined as a shortcut in the XPointer specification. The content of the linking element is the local resource. Other combinations can be:

The XLink specification defines the following attributes:

Attribute name

XML Type

Semantics

xml:link

CDATA

This is a special reserved attribute that indicates that the element shall act as a linking element. Legal values are: simple, extended, locator, group or document. We will only use simple links!

href

CDATA

The value of the href attribute in linking elements shall always contains a locator which identifies a resource, e.g. by an URI-reference or by an XPointer specification.

inline

(true | false)

The inline attribute specifies the first part of a link's semantics. A link is either inline or out-of-line. This attribute is used in connection to the extended links.

role

CDATA

The role attribute also specifies a part of the link's semantics. The value of this attribute identifies to the application software the meaning of the link. This allows the application to show different symbols for the different kinds of links.

title

CDATA

This is the title shown to the user for the remote resource.

show

(embed | replace | new)

This attribute indicates the behaviour policies to use when the link is traversed for the purpose of display or processing. The embed value indicates that the designated resource should be embedded in the body of the resource and at the location where the traversal started. The replace value indicates that the designated resource should replace the resource where the traversal started. The new value indicates that the designated resource should be displayed or processed in a new context.

actuate

(auto | user)

The actuate attribute is used to express a policy as to when the traversal of a link should occur. The auto value indicates that the resource is automatically traversed. The user value indicates that the link is traversed only on the request of the user.

C.2.5 XML entity

XML entities are divided into text and binary entities. There is also a distinction between internal entities and external entities. An internal entity has a value associated with it as part of the entity declaration. An external entity associates a name with a physical storage unit (file name). In the following we define an XML entity named XML which is an internal text entity:

<!ENTITY XML 'Extensible Markup Lanuage'>

<!ELEMENT p (#PCDATA) >

<p>This is written in XML (&XML;).</p>

Only text entities that can be parsed as XML can be referenced directly in the XML document. Binary entities do not contain valid XML and must therefore be referenced in an XML element's attribute of the ENTITY type. Here we define an external binary entity named my.sign and refer to it in the src attribute of an image element.

<!ENTITY my.sign SYSTEM "image/signature.gif" NDATA GIF>

<!ELEMENT pi ANY>

<!ELEMENT image EMPTY>

<!ATTLIST image

src ENTITY #REQUIRED >

<pi>This is my signature: <image src="my.sign"/></pi>

There is a special construct that only can be used in the DTD called a parameter entity. A parameter entity can be used as a shortcut for commonly occurring structures. A parameter entity declaration consist an entity declaration with a "%" sign before the parameter entity name. The string within the apostrophes will be substituted by the XML processor when it reads the DTD. An example of a declaration of a parameter entity called GI.Boolean.att and its usage is as follows:

<!ENTITY % GI.Boolean.att '( true | false )' >

<!ELEMENT my_Boolean_element EMPTY>

<!ATTLIST my_Boolean_element %GI.Boolean.att; >

C.2.6 Character coding

Each XML text entity must declare which character encoding scheme that it uses internally. External parsed entities in an XML document may use different encoding schemes for its characters than used in the root document. All XML processors must accept the UTF-8 and UTF-16 encodings of ISO 10646 as a minimum. Parsed entities which are stored in an encoding other than UTF-8 or UTF-16 must begin with an encoding declaration in the document entity:

<?xml version="1.0" encoding="ISO-10646-UCS-2" ?>

According to the XML recommendation the following values are allowed:

For ISO/IEC 10646 and UNICODE based encodings: "UTF-8", "UTF-16", "ISO-10646-UCS-2" and "ISO-10646-UCS-4" shall be used. If an XML text entity is encoded in UCS-2, it must start with an appropriate encoding signature, the Byte Order Mark, which is the character with hexadecimal value FEFF.

For ISO/IEC 8859: "ISO-8859-1", "ISO-8859-2", ... , "ISO-8859-10" can be used.

For various encodings of JIS X-0208-1997: "ISO-2022-JP", "Shift_JIS" and "EUC-JP" can be used.

For ISO 15046 we will restrict this to "UTF-8", "UTF-16", "ISO-10646-UCS-2" and "ISO-10646-UCS-4".

Alternatively one can reference any character in ISO/IEC 10646 by quoting its character number in a character reference regardless of which encoding scheme used. Or one can declare a text entity that represents the character in question. There are two ways of referring to characters directly, by decimal representation or by hexadecimal reference. The hexadecimal reference for the less-than-sign '<' is:

<

And the decimal representation of the same sign is:

<

C.2.7 XML document header

All XML documents must start with a processing instruction that specifies that this is an XML document and which version of the XML standard used. Information about the character encoding, if it is not "UTF-8" or "UTF-16", shall be included in the header also. An example is as follows:

<?xml version="1.0" encoding="ISO-10646-UCS-2" ?>.

The next element shall be the document type declaration element:

<!DOCTYPE top SYSTEM "root.dtd" [

<!ELEMENT bot EMPTY> ] >

Here we see a document type element declaration that has both an external subset and an internal subset DTD. The root XML element is called top and it must be defined in the external DTD subset declared in the "root.dtd" file. The internal DTD subset is declaring a single XML element called bot.

C.2.8 Miscellaneous

An XML comment may appear anywhere in the document, but outside other markup. Comments are not part of the document's character data. A comment starts with the string ''. An example of a comment is:

All of an XML document is case sensitive, both markup and text. This is different from SGML and HTML and it was introduced to allow markup in non-Latin alphabet characters and to avoid the problem with case folding. This means that element names, entity names and attribute names all are case sensitive. The following elements are different and thus allowed in an XML document:

<road>This is a road element</road>

<Road>This is a Road with a capital R</Road>

C.2.9 Other XML standards

The base XML standard [XML] is associated with a number of supporting standards. The most relevant standards are shortly described in this clause.

A link is an explicit relationship between two or more data objects or portions of data objects. In addition to the IDREF mechanism in XML documents a link can be constructed with attributes on XML elements using a combination of the XLink and XPointer specification. The XML Linking Language [XLink] and the XML Pointer Language [XPointer] specifies constructs to define links between both external and internal objects of XML documents. XLink is used to create simple unidirectional hyperlinks between objects in separate XML documents as well as more sophisticated links. Whereas XPointer is used for addressing internal structures within XML documents, such as references to elements, character strings and other parts of XML documents, whether or not they have an explicit ID attribute.

The Namespace in XML specification [Namespace] provide a simple method for qualifying element and attribute names used in XML documents by associating them with a namespace identified by a URI reference. An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names. This allows XML documents to mix XML elements and attributes from more than one DTD that may have identical names and different semantics. The mechanism for achieving this is to use qualified names for both XML elements and attributes. A qualified name consist of a namespace prefix, a single colon as separator, followed by the local name. Here is an example of the use of the XML namespace mechanism:

Here we see x contain two elements named point. The local point element is defined in the DTD of the XML document, but the gis:point element is defined in a DTD located at URI "http://www.example.org/shema/spatial.dtd". The special purpose attribute xmlns:gis in element x defines gis as a namespace prefix of the declarations defined in the target DTD.

The Resource Description Framework (RDF) is a result of the W3C Metadata Activity. RDF is the foundation for processing and exchanging machine-understandable metadata on the Web using XML as the exchange format. It can be used in a variety of application areas: in resource discovery, in cataloguing for describing the content and content relationships available at a particular Web site, in content rating, in describing collections of pages that represent a single logical document, for describing intellectual property rights of Web pages, and for expressing the privacy preferences of a user as well as the privacy policies of a Web site. The RDF Model and Syntax Specification [RDF] document defines a data model for representing named properties and property values. This data model consist of three object types: Resource, Property and Statement. Instances of the RDF data model is called a RDF Schema. RDF resources are things of interest, e.g. an entire Web page, an XML element or an entire Web site. An RDF property is a specific aspect, characteristic, attribute or relation used to describe a resource. Whereas an RDF statement is a specific RDF resource in combination with a named RDF property for that resource. RDF can be used to express a wide variety of data models, e.g. Entity-Relationship models. The RDF Schema Specification [RDFSchema] document defines a schema specification language that provides a basic type system for use in RDF models. It defines resources and properties such as class and sub-class-of constructs that can be used in application specific schemas. RDF Schemas can be compared with XML DTDs, but unlike an XML DTD, which gives specific constraints on the structure of an XML document, an RDF Schema provides information about the interpretation of the statements given in an RDF data model.

The Document Content Description [DCD] document is a note to W3C which proposes a structural schema facility for XML, which may be used in the same way as the current XML DTD mechanism. DCD is designed for describing constraints to be applied to the structure and content of XML documents. It provides mechanism for defining XML elements, attributes and their logical structure as well as document constraints and basic data types. DCD is a RDF vocabulary, that means that it is based on the above mentioned RDF specifications.

The Schema for Object-oriented XML [SOX] is a note to W3C which proposes an alternative to XML DTDs. SOX provides basic data types, extensible data typing mechanism, content model and attribute interface inheritance. A SOX document, or schema, is a valid XML document that represent a complete XML DTD-like structure.

The Vector Markup Language [VML] is a note to W3C which defines a format for encoding of vector information together with additional markup to describe how that information may be displayed and edited. VML defines XML elements for vector graphic information and uses a stylesheet mechanism to determine the layout of the vector graphics. VML is supported by Microsoft, Autodesc, Visio and Hewlet-Packard and is implemented in Internet Explorer 5.0 beta 3.

The Extensible Stylesheet Language [XSL] is intended to control the appearance of XML documents. XSL is a language for expressing stylesheets. A stylesheet expresses rules for presenting a class of XML documents. Thus a stylesheet contains descriptions on how XML elements can be rendered by an XML browser. XSL views an XML document as a tree and uses a two-stage presentation process. First, the result tree is constructed from the source tree. Second, the result tree is interpreted to produce formatted output on display, on paper, in speech or onto other media.

The Document Object Model [DOM] specification defines a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of XML documents. The DOM provides a standard set of objects for representing HTML and XML documents, a standard model of how these objects can be combined, and a standard interface for accessing and manipulating them. This means that vendors can support DOM as an interface to their proprietary XML processors. The document defines three language bindings: IDL, Java and ECMA Script Language.

C.3 XML Metadata Interchange (XMI)

C.3.1 General

The Object Management Group [OMG] has created a XML based specification for interchange of metadata, called XML Metadata Interchange [XMI]. The main purpose of XMI is to enable interchange of metadata between UML based modelling tools and MOF based metadata repositories. The specification is based on OMG's Meta Object Facility (MOF), which is an OMG metamodelling and metadata repository standard, and on OMG's Unified Modeling Language (UML), which is an OMG modelling standard. The XMI specification defines the design principles for generating XMI based DTDs and XML documents. It consist of a set of XML DTD production rules for transforming MOF based metamodels into XML DTDs, a set of XML document production rules for encoding and decoding MOF based metadata, and concrete DTDs for UML and MOF. The concrete DTDs are generated based on the XML DTD and XML production rules.

The Meta Object Facility [MOF] is OMG's adopted technology for defining metadata and representing it as CORBA objects. In OMG's terminology metadata is a general term for data that in some sense describes information. The term model is generally used to denote a description of something, typically something in the real world and in the MOF context, the term model is any collection of metadata that is related. Metadata that describes metadata is called meta-metadata, and a model that consist of meta-metadata is called a metamodel.

The MOF Model is the MOF's built-in meta-metamodel. It is defined as the top layer of a four-layer metamodelling architecture, where each layer defines the language for specifying models at the layer below, see table C.1. The top layer of this architecture is called the meta-metamodel. The next layer is called the metamodel and the UML metamodel is an example of a model at this layer. The metamodel layer defines the language for the next layer, which is called the model layer. An example of a model at the model layer is any UML model, e.g. the RoadMap application schema in Annex E and the Spatial schema. The model layer defines the language for the lowest layer called the user object layer. Examples of models at this layer are datasets, which conforms to the models at the next higher layer.

Table C.1 — OMG's four-layer metamodelling architecture

Meta-layer

MOF terms

Examples

M3

meta-metamodel

MOF Model

M2

metamodel

UML metamodel

M1

model

UML model (RoadMap)

M0

user objects (data)

Instances

The MOF Model and the UML metamodel are closely aligned in their modelling concepts. The main metadata modelling constructs are: Class, Association and Package.

Classes can have Attributes and Operations at both object and class layer. Attributes are used to represent metadata. Operations are provided to support metamodel specific functions on the metadata. Both Attributes and Operations Parameters may be defined as ordered, or as having structural constraints on their cardinality and uniqueness. Classes may multiply inherit from other Classes.

Associations support binary links between Class instances. Each Association has two AssociationEnds that specify ordering or aggregation semantics, and structural constraints on cardinality or uniqueness. When a Class is the type of an AssociationEnd, the Class may contain a reference that allows navigability of the Association's links from a Class instance.

Packages are collections of related Classes and Associations. Packages can be composed by importing other Packages or by inheriting from them. Packages can also be nested.

Other significant MOF Model constructs are DataType and Constraint. DataTypes allow the use of non-object types for Parameters or Attributes. In the OMG MOF specification, these must be data types or interface types expressible in CORBA IDL. Constraints are used to associate semantic restrictions with other elements in a MOF metamodel. This defines the well-formedness rules for the metadata described by a metamodel. Any language may be used to express Constraints, though there are obvious advantages in using a formal language like OCL.

C.3.2 DTD and XML document production

The XMI specification defines DTD and XML production rules that can be used to transfer any models described by a MOF metamodel, i.e. any metamodel that is defined in the abstract language of the MOF Model. This is illustrated in table C.2. A M2 layer metamodel such as the UML metamodel can be encoded against the XML DTD for the M3 layer MOF Model. And a M1 layer model, a UML model, can be encoded against the XML DTD for the M2 layer UML metamodel.

Table C.2 — XMI and OMG's metadata architecture

Meta-layer

Models

XML DTDs

XML documents

M3

MOF Model

XMI based MOF DTD

M2

UML metamodel

XMI based UML DTD

XMI based MOF metamodel documents

M1

UML models

XMI based UML model documents

M0

Instances

XMI defines a number of XML elements that must be included in the DTDs generated. Some of these XML elements contain metadata about the metadata to be transferred, for example, the identity of the metamodel associated with the metadata, the time the metadata was generated, the tool that generated the metadata, whether the metadata has been verified, etc. All XML elements defined have the prefix "XMI." to avoid name conflicts with XML elements that would be a part of a metamodel. XMI does not make use of XML Namespace, because this is not an W3C recommendation yet, but states that it may be possible to place all required XML elements in a namespace. Every XML element that corresponds to a metamodel class must have attributes that enable the XML element to act as a proxy for a local or remote XML element. These attributes are used to associate an XML element with another XML element. Most of the XML attributes defined have the prefix "xmi.".

Every metamodel class is represented in the DTD by an XML element whose name is the class name. The content model of the element lists the attributes of the class; references to association ends relating to the class; and the classes that this class contains, either explicitly or through composition associations. Every attribute of a metamodel is represented in the DTD by an XML element whose name is the attribute name. The attributes are listed in the content model of the XML element corresponding to the metamodel class in the order they are declared in the metamodel. Each association (both with and without containment) between metamodel classes is represented by two XML elements that represents the roles of the association ends. The multiplicities of the association ends are translated to the XML multiplicities that are valid for specifying the content models of XML elements. The content model of the XML element that represent the container class has an XML element with the name of the role at the association end, with the multiplicity defined for its association end. The XML element representing the role has a content model that allows XML elements representing the associated class and any of its subclasses to be included.

XMI defines a mechanism for extending a metamodel class. Any number of extension elements are included in the content model of any class. These XMI extension elements have a content model of ANY, allowing considerable freedom in the nature of the extensions.

XMI provides mechanisms for specifying differences between documents so that an entire document does not need to be transmitted each time. The XMI specification does not specify an algorithm for computing the differences, just a form for transmitting them. Thus only the model changes that occur can be transmitted. XMI defines the following three types of differences, and the changes they represent:

Delete: The delete operation refers to a particular element of the old model and specifies a deep removal of the referenced element and all of its contents.

Add: The add operation refers to a particular element of the old model and specifies a deep addition. The element and its contents are added. The contents of the new element are added at the optional position specified, the default being as the last element of the contents.

Replace: This operation deletes the old element, but not its contents. The new element and its contents are added at the position of the old element. The original contents of the old element are then added to the contents of the new element at the optional position specified, the default being at the end.

XMI also can be used to transmit incomplete models or model fragments. An incomplete model is a model which may be missing some information, while maintaining the same structure required for valid models.

Every XMI based DTD consist of the following declarations:

An XML version processing instruction and optional encoding declaration. Example: <?xml version="1.0" encoding="UCS-2" ?>

Any other valid XML processing instructions

The required XMI declarations

Declarations for a specific metamodel

Declarations for differences

Declarations for extensions

Every XMI based XML document consist of the following:

An XML version processing instruction with an optional encoding declaration.

Any other valid XML processing instruction

An optional external DTD declaration with an optional internal DTD declaration. Example: <!DOCTYPE XMI SYSTEM "http://www.xmi.org/xmi.dtd">

Any XML elements that conform to the XMI based DTD.

C.4 Design differences

The DTD production rules and the XML document production rules defined in this standard are inspired by XMI, but are not dependent on XMI. In the following we will use the term GI for referring to the encoding standard and XMI for referring to the XMI specification.

The main difference lies in the purpose of the two standards. XMI is designed for interchange of metadata and GI is designed for interchange of data. XMI is defined based on the MOF Model and GI is based on a simplified version of the UML metamodel (the schema model). XMI can be used to encode models at different meta-model levels whereas GI has a fixed defined level. This does not exclude GI from being able to exchange schema information as well. GI can in many ways be thought of as "XMI Light".

XMI uses CORBA data types and GI defines a more conceptual set of data types. This means that GI is independent of CORBA.

XMI is more generic and it has extension mechanism that GI does not have. XMI allows interchange of incomplete models. GI does not, but is simpler and easier to implement.

XMI uses "XMI." as a prefix for its required XML elements and "xmi." for its required XML attributes, whereas GI uses "GI." and "gi.". The production rules for property elements, association elements and composition elements are similar, except that GI moves the multiplicity from the entity to the association and composition element declarations.

Both standards use qualified names for the production of element names. GI has an alternative mechanism for tag-name production of element names. This may facilitates language independent XML documents.

C.5 References

[FAQ] Frequently Asked Questions about the Extensible Markup Language, Version 1.41 (6 October 1998), http://www.ucc.ie/xml

[XML] Extensible Markup Language (XML) 1.0, W3C Recommendation 10-February-1998. http://www.w3.org/TR/REC-xml

[W3C] The World Wide Web Consortium (W3C). The standards group responsible for maintaining and advancing HTML and other Web related standards. http://www.w3.org/

[XLink] XML Linking Language (XLink), W3C Working Draft 03-March-1998. http://www.w3.org/TR/WD-xlink

[XPointer] XML Pointer Language (XPointer), W3C Working Draft 03-March-1998, http://www.w3.org/TR/WD-xptr

[Namespace] Namespaces in XML, W3C Working Draft, 17-November-1998, http://www.w3.org/TR/PR-xml-names

[RDF] Resource Description Framework (RDF) Model and Syntax Specification, W3C Proposed Recommendation, 05-January-1999, http://www.w3.org/TR/PR-rdf-syntax

[RDFSchema] Resource Description Framework (RDF) Schema Specification, W3C Working Draft 30-October-1998, http://www.w3.org/TR/WD-rdf-schema

[DOM] Document Object Model (DOM) Level 1 Specification, Version 1.0, W3C Recommendation 01-October-1998, http://www.w3.org/TR/REC-DOM-Level-1

[XSL] Extensible Stylesheet Language (XSL), Version 1.0, W3C Working Draft 16-December-1998, http://www.w3.org/TR/WD-xsl

[DCD] Document Content Description for XML, W3C NOTE, 31-July-1998, http://www.w3c.org/TR/NOTE-dcd

[SOX] Schema for Object-oriented XML (SOX), W3C NOTE, 30-September-1998, http://www.w3c.org/TR/NOTE-SOX

[VML] Vector Markup Language (VML), W3C NOTE, 13-May-1998, http://www.w3c.org/TR/NOTE-VML

[OMG] The Object Management Group, http://www.omg.org/

[XMI] XML Metadata Interchange (XMI), Proposal to the OMG Object Analysis & Design Task Force RFP 3: Stream-based Model Interchange Format (SMIF), Joint Submission, OMG Document ad/98-10-05, October 20, 1998, http://www.omg.org/techprocess/meetings/schedule/Stream-based_Model_Interchange.html

[MOF] OMG's Meta Object Facility (MOF) Specification, ad/97-08-14, http://www.omg.org/techprocess/meetings/schedule/Technology_Adoptions.html#MOF_Specification

XML Type	Semantics
ID	An identifier for the element that, if specified, must be unique within the XML document. The value of the identifier must always start with a letter, '_' or ':'. An XML element can only have one attribute of type ID.
IDREF	A reference to an XML element in the XML document. The value must correspond to an attribute value of type ID in an existing XML element.
IDREFS	A reference to one or more XML elements. The values must be separated by spaces and must correspond to existing XML element ID's.
ENTITY	A reference to an external entity. The value must be a legal entity name.
ENTITIES	A reference to any number of entity names, where the entity names is separated by spaces.
NMTOKEN	A NMTOKEN (Name Token) is any mixture of characters.
NMTOKENS	Any number of NMTOKENs separated by spaces.

Attribute name	XML Type	Semantics
xml:lang	NMTOKEN	A special attribute that may be inserted in documents to specify the language used in the contents and attribute values of any element in a XML document. But it must be defined in the attribute list specification of the actual element.
xml:space	(default \| preserve)	A special attribute that signals an intention that in that element, white space should be preserved by applications.

Meta-layer	MOF terms	Examples
M3	meta-metamodel	MOF Model
M2	metamodel	UML metamodel
M1	model	UML model (RoadMap)
M0	user objects (data)	Instances

Meta-layer	Models	XML DTDs	XML documents
M3	MOF Model	XMI based MOF DTD
M2	UML metamodel	XMI based UML DTD	XMI based MOF metamodel documents
M1	UML models		XMI based UML model documents
M0	Instances