[This local archive copy mirrored from the canonical site: http://www.geocities.com/WallStreet/Floor/5815/guide.htm; links may not have complete integrity, so use the canonical document at this URL if possible.]

Guidelines for using XML for Electronic Data Interchange

Version 0.04

23rd December 1997

Editor: Martin Bryan, The SGML Centre

Contributors: Members of the XML/EDI working group, including Benoít Marchal, Norbert H Mikula, Bruce Peat and David RR Webber.

XML/EDI Group Home Page URL: http://www.xmledi.net

Copyright © 1997. XML/EDI Group. All rights reserved, no part of this document may be commercially reproduced in part or in whole without consent and prior approval.

Changes made to this version

Minor clarifications to take into account comments received.


  1. Purpose & Goal of the XML/EDI Guidelines
  2. Definitions for XML/EDI
  3. Scope of XML/EDI
  4. Base Technologies of XML/EDI
  5. XML/EDI Components
  6. The Implementation Process

1. Purpose & Goal of the XML/EDI Guidelines

Put simply, the goal of XML/EDI is to deliver unambiguous and durable business transactions via electronic means.

Associated with this is a goal to establish a standard for commercial electronic data interchange that is open and accessible to all, and which delivers a broad spectrum of capabilities suitable to meet the full breadth of business needs.

To achieve this requires the use of a methodology that it is not only extensible enough to meet future requirements but also adaptable enough to incorporate new technologies and requirements as they emerge. To ensure broad adoption the technology selected needs to be widely and freely available. The Extensible Markup Language (XML) developed by the World Wide Web Consortium (W3C) provides such a freely available, widely transportable, methodology for well-controlled data interchange.

XML was designed principally for the exchange of information in the form of computer displayable "documents". Not all commercial data is interchanged in a displayable format. In particular data designed for electronic data interchange typically needs to be processed before it can be displayed. For this to be possible the data must be mapped, using some form of template, to a set of processing rules. These XML/EDI guidelines provide a standardized way in which such rules templates can be added to interchanged data.

These XML/EDI guidelines begin by formally defining the terms used in the text. This is followed by an impact statement that makes predictions from various viewpoints. The guidelines then give a background on the tools and standards which XML/EDI is built.

Note: These guidelines form the basis for development work on XML/EDI. They form an precursor to a formal "Specification of an EDI Application for XML" which will be submitted to the W3C to be sanctioned as an industry standard. As a document designed to be a lighting rod for ideas, this working document has been, and will continue to be, released in draft form. Comments on this draft should be sent to the XML/EDI working group at xml-edi@riv.be.

2. Definitions for XML/EDI

Electronic commerce has been defined in the European Workshop on Open System's Technical Guide on Electronic Commerce (EWOS ETG 066) as "Electronic exchange of data to support business transactions, i.e. the exchange of value through the delivery of a product from a seller to a buyer". As such it encompasses much more than what has been possible using traditional methods of Electronic Data Interchange (EDI) such as EDIFACT. Electronic commerce is defined by EWOS as covering activities such as marketing, contract exchange, logistics support, settlement and interaction with administrative bodies (e.g. tax and custom data interchange). Electronic commerce covers all industrial and service operations, including services such as insurance, healthcare, travel and interactive home shopping.

Many people use the term EDI to refer to the set of messages developed for business-to-business communication as part of the United Nations Standard Messages Directory for Electronic Data Interchange for Administration, Commerce and Transport (EDIFACT). EDIFACT messages are transmitted in compressed form, using predefined field identifiers, which must occur in a predefined sequence. While EDI is, strictly speaking, wider in scope than EDIFACT, for the purposes of these guidelines EDI will be used in this restricted sense when not otherwise qualified.

The basic unit of information in an EDI message is the data element. For an EDI invoice, each item being invoiced would be represented by a data element. Data elements can be grouped into compound data elements, and data elements and/or compound data elements may be grouped into data segments. Data segments can be grouped into loops; and loops and/or data segments form business documents.

The EDIFACT standards define whether data segments are mandatory, optional, or conditional, and indicate whether, how many times, and in what order a particular data segment can be repeated. For each EDI message, a field definition table exists. For each data segment, the field definition table includes a key field identifier string to indicate the data elements to be included in the data segment, the sequence of the elements, whether each element is mandatory, optional, or conditional, and the form of each element in terms of the number of characters and whether the characters are numeric or alphabetic. Similarly, field definition tables include data element identifier strings to describe individual data elements. Element identifier strings define an element's name, a reference designator, a data dictionary reference number specifying the location in a data dictionary where information on the data element can be found, a requirement designator (either mandatory, optional, or conditional), a type (such as numeric, decimal, or alphanumeric), and a length (minimum and maximum number of characters). A data element dictionary gives the content and meaning for each data element.

Originally, EDI translation software was developed to support a variety of private system formats. Most often, the sender and receiver were required to contract in advance for a tailored software program that would be dedicated to mapping between their two types of datasets. Each time a new sender or receiver was added to the client list, a new translation program would be needed by the new party to format their data to conform to the standards in use by the participants. Of course, this becomes expensive. Such static systems do not easily allow synchronization of business transactions in distributed business processes that involve global rules, but with participants and actions that are not predetermined. To solve these issues it is desirable to develop automated tools and techniques that are easy to use and allow decomposition of transactions in actions to be performed locally and mapping of local actions onto efficient protocol exchanges.

The concept of the Electronic Enterprise requires a transition away from paper form based EDI. Key concepts that are required are the encapsulation of agreed sets of business rules (in EDI parlance the Implementation Guidelines) and also mechanisms to handle state and flow control (such as those provided by hyperlink anchors in HTML files). Also message sets must be able to handle partial information, where the complete information is not yet available, or simply is not required for the particular business process. This allows different parts of an enterprise to selectively contribute only the information that is germane to their business functions.

XML is the Extensible Markup Language subset of ISO's Standard Generalized Markup Language (SGML) developed by the World Wide Web Consortium (W3C) SGML on the Web working party during the latter half of 1996 and early 1997. The formal recommendation was submitted for approval by W3C members on 8th December 1997.

On 10th September 1997 a proposal for a new form of XML Style Language (XSL), which incorporates the ECMAScript standardized variant of JavaScript, was published by a consortium led by Microsoft, ArborText and the Inso Corporation. This version of the XML/EDI specification uses the power provided by this new advanced language combination to show how control of XML/EDI document processes can be achieved in a distributed manner.

In October 1997 a specification for a formal Document Object Model (DOM) for XML documents was published by W3C. This model provides a standardized API for XML-based tools.

Combining XML and EDI to develop XML/EDI indicates that the main method of capturing and coding EDI information will be through XML-coded electronic forms. In addition the XML/EDI specification shows how EDIFACT messages can be generated from XML/EDI forms, and vice versa.

XML/EDI isn't creating a new standard. XML/EDI is defining how companies can use current standards to solve their business problems.

3. Scope of XML/EDI

Detail of the scope of XML/EDI, and the impact it is expected to have on business communities, are covered in Introducing XML/EDI.... To help readers of this document to appreciate the differences in practice between traditional EDIFACT-based web transactions and XML/EDI this section discusses some of the differences between traditional business-to-business electronic data interchange systems and the new breed of interactive electronic commerce tools being provided through the Internet.

Business-to-business Electronic Data Interchange

Electronic Data Interchange (EDI) has been used for business-to-business communication for almost a quarter of a century. Initial efforts involved inter-company agreements on how to exchange commercial data, initially as information stored on tape and later as messages sent over dedicated data lines. To avoid having to use different protocols to move data between different companies, various industry groups identified sets of data that could form the basis of individual agreements. The industry groups also sought to agree the format in which fields in such data sets were interchange so that a company only needed to develop one methodology for decoding information received without resource to human intervention.

The Achilles Heel for this approach has always been two fold. Firstly, companies require flexibility in, and wish to deviate from, doctrinaire standards that do not fully meet their business needs. Secondly, because the standards are pre-ordained there is no mechanism provided to transfer processing rules and associated information. It is assumed that the data meets the defined constraints and if not, has been duly modified to conform. This means that companies must conduct exacting analysis to determine precisely how they are going to move their business data to and from the predefined EDI formats.

The cost of these constraints has been borne as excessively long and complex implementation cycles for traditional EDI systems.

The world has changed from thirty years ago, and now requires more dynamic and vibrant services that match the organized yet ad hoc nature presented by both modern business practice, and particularly its manifestations on the Internet. The Internet is re-writing the rules on how people interact, buy and sell, and exchange goods and services. In particular the Internet is showing us that EDI is not only relevant for business-to-business communications. The same concepts are also relevant for all consumer-to-supplier relationships, whether the consumer is an end-user, a manufacturer, a service organization such as a hospital or a hotel, a governmental organization or a virtual organization.

Interactive Electronic Commerce

With the arrival of the Internet in the last decade of the 20th century the pattern of electronic commerce has dramatically changed. In particular, the Internet has introduced many new ways of trading, allowing interaction between groups that previously could not economically afford to trade with one another.

Whereas previously commercial data interchange involved mainly the movement of data fields from one computer to another, without human intervention, the new model for electronic commerce introduced by the Internet is typically dependent on human interaction for the transaction to take place. The new model is based principally on the use of interactive selection of a set of options, and on the completion of "electronic forms", to specify user requirements.

As this new model develops there has been a fundamental shift in how data used for commerce should be processed. The original create-->transmit-->receive-->process cycle of information processing, using individual programs, is beginning to be replaced by the concept of active objects which have inherent processes associated with them, based on the class of information they contain. Today an invoice may no longer contain a copy of the information stored in the database it was generated from: instead it contains a pointer that says where it expects to get the data from, and this data will be fetched from its managed source each time the invoice is processed.

Such interactive programs require us to review the underlying philosophy of electronic commerce. What are the characteristics of a system designed for Interactive Electronic Commerce in an international marketplace?

To be truly interactive you need to be able to:

  1. Understand the business concepts represented in the interchanged data.
  2. Apply business-specific rules to the interchanged data to identify what class(es) of data it contains and formulate appropriate responses.

To do this you need to be able to:

Because these interactions can be complex, and potentially require specialized knowledge, the rule templates can be supplemented by XML/EDI data manipulation agents (DataBots) to ensure that users can express their requirements in high-level, natural language, terms. DataBots automatically create appropriate rule templates and XML syntax to match user requirements and broker the entire interchange.

When DataBots are being used XML/EDI is identified as being robot generated by adding an R to its name to become XML/EDI-R.

At this point in time the ECMAScript subset of the Java programming language provides the vehicle that permits the DataBots to be deployed and received along with XML/EDI messages.

4. Base Technologies of XML/EDI

XML/EDI is a synthesis of many concepts. XML/EDI:

Why use XML?

XML will be native language for the next generation of most of the popular WWW browsers. XML/EDI seeks to leverage the work and support (technically and financially) which XML is receiving. With traditional EDI, the infrastructure was built from the ground up, without being able to share resources with other programs. This paradigm is no longer appropriate in today's world of shared software development. By adopting XML/EDI, the EDI community can get to share the cost of extension and future development.

In 1986 the International Organization for Standardization (ISO) published an international standard defining a Standard Generalized Markup Language (SGML) that allowed its users to:

SGML has formed the basis of many of the large, multinational, documentation projects that have developed in the decade since its publication. It also formed the basis for the formalization of the HyperText Markup Language (HTML) that led to the formation of the World Wide Web of documentation that has become available on the Internet.

Key to the success of HTML was the development of the concept of Uniform Resource Locators (URLs) that allow users to identify the source of each piece of shared data in a consistent manner. Whilst the original concept has limitations as to the granularity of data access, its universality has greatly improved computer-to-computer communications.

In July 1996 the World Wide Web Consortium (W3C) set up a working group to study how SGML could be simplified to allow for its efficient use over the Internet. The result was the development of an Extensible Markup Language (XML) that combined the expressive power of SGML with the Internet-aware functionality of HTML.

XML provides an ideal methodology for interactive electronic commerce because:

Integrating XML with EDI

XML can be integrated with existing EDI systems by:

XML can extend existing EDI applications by:

5. XML/EDI Components

Figure 1 illustrates the main layers of a fully integrated XML/EDI system.

XML/EDI layers

Figure 1: The layers of an XML/EDI system

The XML/EDI specific components are built on top of existing standards for transmitting and processing XML-encoded data. These standards define shared features such as:

XML parsers, document browsers, page markup programs and related software functions are available of-the-shelf today. XML/EDI isn't, therefore, a new standard; it simply provides a framework for using existing standards to tackle existing problems in a new way.

XML/EDI specific components will either manifest themselves as built-in components into existing products, plug-in programs to existing tools or standalone applications. It is anticipated that new applications will be created from the spark of XML/EDI implementation.

Types of Applications

The following examples of the type of facilities that could be built into an XML/EDI implementation isn't comprehensive, but a starting place for discussion:

Each of these options is explained in more detail in the following subsections.

Lexicon Repositories

A primary component of XML/EDI is its dynamic common language and syntax repository. The various type of repositories include:

XML/EDI Data Manipulation Agents (DataBots)

The central goals behind the development of the concept of DataBots are:

All these goals are realizable using XML/EDI-R.

DataBots and their associated XSL scripts provide facilities that allow XML/EDI systems to:

It should also be noted that the template method that XML/EDI DataBots implement is extremely compact and concise. This means that it is a low-bandwidth, efficient protocol, which is required to meet high volume constraints in batch EDI delivery systems.

Some additional considerations also need to be taken into account include Process Control and Object Oriented support. Process Control can be easily accommodated using through the trend towards the use of the Integrated Computer Aided Manufacturing (ICAM) Definition Language (IDEF) process modelling language. Developers can either assign XML tokens to IDEF entities, and then process control lines added to the template format, or IDEF can be defined as a notation that can be processed by an XML/EDI-aware browser. Object oriented support can be provided through W3C's Document Object Model (DOM), which provides a CORBA IDL definition for XML objects.

In summary, the optional DataBots component provides the agent that brokers, controls, corrects, directs and ensures that the XML/EDI-R method can progress information transfers correctly.

XML/EDI Business Objects

XML/EDI business objects will be available off-the shelf, created by developers, with rule sequences devised by users. The usage of these objects can be defined by their sphere of influence. Business objects can be:

Business objects, in most but not all cases, will be invoked by the XML/EDI Data Manipulation Agents. It is anticipated that for efficiency these object manipulation DataBots will be written in Java, or using similarly dirstributed programming language tools. End-users will be supplied with tools that automatically generate the relevant agents from information provided about the application.

Below are just a few examples of the many possible classes of XML/EDI business objects:


Used for the interactive creation and completion of form-based EDI, XML/EDItors are predicated to become the front-end for business applications. XML/EDI editors will reference Lexicon Repositories to prompt users for appropriate data using XML parse trees to request related fields.

XML/EDI extensions for message stores

It is anticipated that message stores will require extensions to provide the types of complex workflow management needed to ensure the correct delivery and processing of XML/EDI messages. For example, a message store should not be able to acknowledge receipt of a message until its contents have been parsed by an XML parser to ensure that the unencrypted data stream still forms a valid message.

In time it is anticipated that message stores will mutate to use XML natively. This is not because of XML/EDI directly but because message stores that know how to identify, search for and process objects within multimedia streams or business messages will be required for a wide range of application scenarios.

Search Agents

Based on ad-hoc, learned or profiled information, search engines will recognize XML/EDI specific tagging and be able to reference suitable private and public message stores, using standard WWW interfacing, to extract data intelligently. This will allow for the best combination of free-text and fielded search. Catalogs and buyer agents will be among the first to use XML/EDI technology in this way.

Trading Partner Pages

XML/EDI will use a mix of today's X.500 technology, security certificates, "yellow pages", Email look-up, and verified characteristics of entities. This is a critical component of performing business, much more so when employing electronic means. Subsystems will undoubtedly develop along these lines: they will have to support XML/EDI interfacing of basic CRUD functions (Create, Revise, Update, Delete) as a minimum. XML/EDI Data Manipulation Agents shall be able to draw upon these resources to validate transactions.

6. The Implementation Process

Using XML for Electronic Data Interchange

The following stages are involved in using XML for the interchange of commercial EDI messages:

An application does not need to use all of the levels of processing shown in Figure 1 and the above list: it can stop at whichever level in the hierarchy suits it. For example, an application can confine itself to checking incoming and outgoing EDI messages using a document object model that has been formally defined in an XML DTD.

Identifying data sets

Identification of data sets for interactive electronic commerce will often be the responsibility of industry associations and various standardization bodies such as UN/EDIFACT and EBES (the European Board for EDI standardization).

Whereas existing EDI definitions are primarily concerned with the way in which a set of fields forms a message, the concepts required for XML/EDI are based more on the definition of independent classes of information that can be combined together with other classes of information to form interchangeable messages. As such the concepts are more akin to the idea of a Basic Semantic Repository (BSR) being proposed by ISO, and of the Business Systems Interconnection (BSI) proposal from University of Melbourne.

There is, however, one basic difference between using XML/EDI for defining data classes and using the BSR or BSI methodologies. In XML/EDI the order and number of subclasses of a data class can be altered by message creators without having to formally register that fact with any centralized organization. For example, if it was necessary for an application to separate building numbers or names from information about the street the building is located within, XML/EDI would allow system developers to define two new subclasses that would be combined to provide the information needed for an existing EDI address component.

One of the advantages the accrues from XML/EDI's ability to subclass fields is that such fields can be developed interactively using information supplied from more than one location. For example, telephone order processing systems in today's world of interactive electronic commerce often start by asking users for their postcode. This tells the system which region, town and street the user is located in, but not which building they are in. To find this out you need to ask the user for a number or name that uniquely identifies the building within the street identified by the postcode. Using these two related pieces of information it is possible to interactively complete a standardized class of information, an address, that can then be shared by an order, its delivery note, and the invoice required for settlement.

Once information has been captured once, and used to create an instance of the relevant class of data, it should not be necessary to recreate the information each time it is required. All that should be needed is that processes that need this information reference the point at which the data was originally captured, e.g. the address associated with the order for the goods.

To ensure that users can guarantee the long-term maintenance of data set components repositories of definitions will need to be created, and unique object identifiers will need to be assigned to each set of components. While initially testing can be done using system identifiers that resolve to Internet Unique Resource Locators (URLs), in the longer term a mechanism for identifying shared data sets using formally registered SGML public identifiers associated with URLs will need to be developed. A system for resolving public identifiers to obtain copies of the registered definitions will also be required.

Developing DTDs

Messages that pass between systems will typically conform to a previously agreed XML document type definition (DTD) that formally describes, in terms interpretable by both humans and computers, an internationally accepted message type.

Note: The structure of XML DTDs and document instances is formally defined in Extensible Markup Language (XML). A bried introduction to the components of XML can be found in An Introduction to the Extensible Markup Language. More complete information on the the structure of SGML DTDs, including those that implement the Web SGML extensions, can be found in Web SGML and HTML 4.0 Explained, which contains examples of the use of each of the constructs used in SGML and XML, and explains how these facilities are used within HTML.

Warning: For the time being the following text presumes some knowledge of SGML and/or XML.

XML DTDs can be developed by:

Declarations that form a standardized XML DTD will typically be stored in separate files, which can be referenced, as an XML external subset, by those wishing to use it through the Internet Uniform Resource Locator that its originator has assigned to a publicly available copy of the data. Alternatively, if public access is to be restricted, the document type definition can be stored as the internal subset within the document type definition sent with the message.

Where the document type definition is based on classes of information shared by more than one message, each class of information can be defined in a separate file, known in XML as an external entity, these files being referenced in a suitable sequence from within the external or internal subset of the XML DTD.

For example, an XML DTD could have the form:

<!ENTITY % address SYSTEM "http://www.myco.org/messages/XML/address.xml" >
<!ENTITY % items SYSTEM "http://www.edifact.org/messages/XML/items.xml">
<!ENTITY % data "(#PCDATA)">
<!ELEMENT order (order-no, deliver-to, invoice-to, item+) >
<!ELEMENT order-no %data; >
<!ELEMENT deliver-to (address) >
<!ELEMENT invoice-to (address) >
<!--Import standard address class-->
<!--Import standard item class-->

This DTD fragment defines two external and one internal parameter entity, four locally defined elements and contains two parameter entity references (%address; and %items;) that call in the contents of the external entities at appropriate points in the definition. Both of the parameter entity references are preceded by explanatory comments.

Note that the source of each class of information is identified not in the call to the class itself (%address;) but within a formal definition of the data storage entities required to process the class definition references (e.g. the first two lines of the DTD). This technique allows files to be moved without having to change the main definition of the DTD.

Typically the entity definitions will be stored outside the DTD, which will contain a reference to the URL of the point at which the latest details of library file locations can be found. For example:

<!ENTITY % library SYSTEM "http://www.myco.org/messages/XML/library.ent">
<!ELEMENT order (order-no, deliver-to, invoice-to, item+) >
<!ELEMENT order-no %data; >
<!ELEMENT deliver-to (address) >
<!ELEMENT invoice-to (address) >
<!--Import standard address class-->
<!--Import standard item class-->

where %library; references a file containing the entity definitions given at the start of the previous example.

XML provides (experimental) facilities for ensuring that data modules taken from libraries do not introduce name clashes in their elements. The names of elements within each module can be qualified by a module (namespace) identifier. Each namespace identifier can be associated with a URL that uniquely identifies where the module is formally defined. For example, the contents of the library file referenced above could be defined as:

<?xml-namespace href="http://www.ebes.org/XML/EDI-address.xml" as="address"?>
<?xml-namespace href="http://www.ean-fora.org/XML/order-items.xml" as="items"?>
<!ENTITY % data "(#PCDATA)">
<!ENTITY % address "
<!ELEMENT address (address:company, address:street, address:town,
                   address:region, address:postcode)             >
<!ATTLIST address id ID #IMPLIED >
<!ELEMENT address:company %data; >
<!ELEMENT address:street %data; >
<!ELEMENT address:town %data; >
<!ELEMENT address:region %data; >
<!ELEMENT address:postcode %data; >
<!ELEMENT same-as EMPTY>
<!ATTLIST same-as idref IDREF #REQUIRED >
<!ENTITY % items "
<!ELEMENT item (item:identifier, item:name, item:quantity)>
<!ELEMENT item:identifier %data; >
<!ELEMENT item:name %data; >
<!ELEMENT item:quantity %data; >

Application-specific extensions

XML permits entities and attributes that are defined in the external subset to be redefined in the internal subset. This facility allows XML/EDI users to develop locally significant subclasses. It can also be used to create subsets of messages by removing unused fields from the data model.

For example, the internal subset of a DTD based on the above standardized DTD could contain the following local redefinition for the %items; parameter entity:

<!ENTITY % items "
<!ELEMENT item (item:identifier, item:name, item:quantity)>
<!ELEMENT item:identifier (item:database-key?, item:EAN) >
<!ELEMENT item:database-key %data; >
<!ELEMENT item:EAN %data; >
<!ELEMENT item:name %data; >
<!ELEMENT item:quantity %data; >

In this case the optional item:database-key field could contain a direct pointer to the database entry from which the EAN and associated product name were obtained. This key could be used by a DataBot to process the item information without having to generate a query based on the EAN normally provided by the identifier field as the basis for a slower-to-process database query.

Creating message instances

An XML/EDI interactive electronic commerce message consists of a pointer to the document type definition, any definitions required in the internal subset of the DTD, and entries for each of the fields required for the message. For example, the following document type declaration could be used to extend the external DTD shown in the first of the examples shown above, which is identified by its Internet Unique Resource Locator:

<!DOCTYPE order SYSTEM "http://www.myco.org/messages/XML/message1.xml" [
<!ENTITY % items "
<!ELEMENT item (item:identifier, item:name, item:quantity)>
<!ELEMENT item:identifier (item:database-key?, item:EAN) >
<!ELEMENT item:database-key %data; >
<!ELEMENT item:EAN %data; >
<!ELEMENT item:name %data; >
<!ELEMENT item:quantity %data; >
<address id="SGML154">
<address:company>The SGML Centre</address:company>
<address:street>29 Oldbury Orchard</address:street>
<address:postcode>GL3 2PU</address:postcode>
<same-as idref="SMGL154"/>
<item:name>Special Offer 16</item:name>

Note that, because of the prioritization SGML gives to local definitions, the definition for the %items; parameter entity provided in the local subset will replace the reference to the external source for the same entity provided as part of the file referenced using the external subset.

Validating messages

XML/EDI messages can be validated by a validating XML document instance processor (known as an XML parser) to ensure they contain all required elements from the specified data set, and that the fields are in the required sequence. When the document is found to be valid the parser can generate a document tree that conforms to the rules laid down in the Document Object Model (DOM) specification that provides a standardized API between XML parsers and browsers and other forms of program.

XML elements can be assigned attributes that point to processors that can undertake relevant data validity checks. This can be done either by associating notation processors with an element, or by associating an ECMAScript specification with the element as part of an XSL "action" associated with the specific element types used in specific contexts, or with particular attribute values.

Where the XML Style Language (XSL) is not being used (e.g. because the browser does not yet support it) the basic XML language allows user-defined notation processors to be used to validate the contents of specific XML elements. This is done by adding definitions of the following form to the external or internal subset of the DTD:

<!NOTATION EAN-vailidator SYSTEM "http://www.myco.org/messages/validate/EAN.cgi">
<!ATTLIST EAN check NOTATION (EAN-validator) #FIXED "EAN-validator">

The predefined check attribute of the EAN element will cause the contents of the element to be passed to the program identified by the declaration for the notation assigned the local name EAN-validator which is stored at the location indicated by the URL given in the notation declaration. This processor would typically pass back a message indicating whether or not the EAN is valid within the context of the relevant message.

XSL provides an alternative, and more generally applicable method that allows ECMAScript to be used to validate the contents of XML elements. Details of this method are given below under the heading "Processing messages".

Note: In December 1997 an extension to SGML allowed typed data attributes to be used in standard SGML files. As soon as this new functionality is absorbed into XML it will be possible to greatly simplify the validation of message contents.

Exchanging messages

Data captured in XML/EDI messages can be exchanged:

Where conversion into a known EDIFACT format is required the DTD can be extended to provide additional attributes that can guide the transformation process. For example, the following additional properties could be added to the list of attributes assigned to the EAN element:

<!ATTLIST EAN check NOTATION (EAN-validator) #FIXED "EAN-validator"
 EDI-prefix CDATA  #FIXED "LIN+1++"
 EDI-suffix CDATA  #FIXED ":EN'" >

Messages exchanged as XML/EDI files can be re-validated on receipt by running them through an XML/EDI validating parser. Where messages have been converted into non-XML files prior to transmission the conversion should be reversed to allow re-validation of the received message.

During re-validation any linked parts of messages should be retrieved to ensure that the full contents of the message have been checked. When re-validation has been confirmed the Document Object Model created as part of the validation process can be used to create an auditable copy of the received message in a message store/database.

Processing messages

The way in which a received message would be processed would depend on which of the available methods for exchanging messages was chosen. If the message was received in a format that provided the XML/EDI message generated by the originator, the XML Style Language (XSL) can be used to associate different processes with individual element classes so that elements can be processed by one or more local processors.

XML/EDI message instances are specifically designed to make the selection of data fields and classes at the receiver as easy as possible. Each field starts with a "start-tag" that clearly identifies the class (element type in SGML/XML parlance) of the following data or embedded subelement set, and specifies any non-default properties to be associated with the data. The end of each data element is clearly identified by an "end-tag", which consists of the name of the element (class) preceded by a slash between a matched pair of outward pointing angle brackets. Fields that contain no data, and no embedded subelements, (e.g. fields that are only present to point to other data sources) have the slash indicating their end point immediately before the last angle bracket of the start-tag rather than immediately after the first one of the end-tag. (See the example for the <same-as/> element above.) Classes that contain subclasses of information have embedded elements between their start-tag and end-tag.

XSL allows sets of actions to be associated with particular XML elements. Actions can be defined in terms of values to be assigned to a set of data presentation attributes (styles), or in terms of a data processing script that users can define using a define-script object . XSL scripts are defined using the ECMAScript language used for exchanging Java programming modules.

Which actions are associated with which elements can be defined using XML element sets known as XSL rules. A simplified set of style-rules allow presentation properties to be applied to element classes. Rules can be associated with elements that have been assigned a unique identifier (id) attribute or that have been assigned a particular value for a class attribute.

Sets of rules and actions can be defined in macros. Macros can be associated with style processing attributes associated with specific instances of an element. The default set of style properties defined in XSL can be extended using define-style objects

The component parts of an XML Style Sheet can be:

A typical XML/EDI XSL description will contain:

XSL actions are typically associated with the way in which objects should be presented to users. This process is typically controlled through the use of flow objects. XSL provides two default sets of flow objects, one based on the elements typically found in HTML files, and the other based on the flow objects defined in ISO/IEC 10179 (DSSSL). The set of DSSSL flow objects supported by XSL includes:

The <eval> element can be used to indicate points at which macros and scripts are to be evaluated as a result of applying a rule.

For an example of the use of XSL specifications refer to Appendix A.

Activating rules

The XML link process can be used to associate XML/EDI rules with a file. Normally the Simple Link format will be used to identify one or more files containing the relevant rules. Typically this will result in an element of the following form being added to the start of the document instance:

<rules-template XML-LINK="SIMPLE" ROLE="xml/edi-rules"
TITLE="Rules for processing orders" SHOW="EMBED" ACTUATE="AUTO"/>

Note: The XML-LINK, ROLE, SHOW and ACTUATEattributes would typically be defined as default values in the associated DTD. They are shown in this example to illustrate the type of information that gets associated with an XML/EDI rule link. The TITLEattribute is optional. It provides some text that users can click on to display the relevant rules file.

Appendix A1: Using XML/EDI for Book Ordering

The following statement of the current role of EDI in Book Ordering was made by the European Board of EDI Standardization by the UK Book Industry Communication (BIC) manager, Brian Green in May 1997:

"The nature of the book trade has encouraged its adoption of various forms of Electronic Commerce over the last 20 years. The introduction of a national UK standard book numbering system in the 1960's and an international standard (ISBN) in the early 70's together with central catalogues of books in print in nearly all countries was essential for an industry where even the smallest retail outlet offered customers the facility to order any one of around 600,000 books currently in print (in the UK) from 20,000 publishers with, currently, one hundred thousand new titles appearing every year. There was no hub in the traditional sense since, although WH Smith in the UK has always had a large market share, the number of book titles stocked is relatively low and they have not, until very recently, been much concerned with customers special orders.

In the late 1970's, the UK book trade set up Teleordering as a centralized ordering service using a simple non-standard order format, providing dedicated terminals on which booksellers simply keyed quantity and ISBN (their location number was installed on the form as a default). The orders were polled overnight by Teleordering and automatically routed to the correct publisher either electronically or, in the case of small publishers, by mail or fax. The bookseller received a basic confirmation of receipt of the order by Teleordering with an indication from the Teleordering database whether the book was recorded as available or out of print. Today TeleOrdering has an annual throughput of some 27 million orders, runs on PC's and is owned J Whitaker & Sons who also publish a 'books in print' CD-ROM and provide a sales data monitoring service. Teleordering has also established itself as an EDI VAN with a full range of Tradacoms and EDIFACT messages. The two services run side by side and will convert the non-standard Teleordering format orders coming from booksellers to EDIFACT or Tradacoms for transmission to publishers.

Similar services were set up in other European countries, the US, Canada etc., although the UK service has always been the largest in the world.

A second book trade EDI service, called First EDItion was set up in 1992 in the UK. This is a pure EDI service based on INS and is particularly strong in the library sector. Both First EDItion and Teleordering are being used for international trade, mainly between UK publishers and European wholesalers who, e.g. in Netherlands and Germany, operate their own dedicated electronic ordering services for booksellers in their countries. First Edition has announced that it will introduce a book trade service based on GE's "TradeWeb", which offers a forms-based Internet service linking to the GEIS VAN.

There has been an interesting 'light EDI' scheme running in the UK for the last four years. Following publication of the book trade Tradacoms messages by Book Industry Communication, the UK book trade EDI body, the major UK wholesalers, who had until then been offering dedicated electronic ordering services, decided to collaborate in a service called BUYLINE. They provided all their bookseller customers, at a nominal cost with simple forms based ordering software that links in with either the 'book bank' books in print CD-ROM or a wholesalers own stockist, enabling the bookseller to select the books required and choose their supplier from a pull down list. BUYLINE includes communications software that dials up the selected supplier and transmits the order in Tradacoms format. The software will also accept Tradacoms acknowledgments and present these to the user in a simple user-friendly format. The rights in this product have now reverted to the systems house, Triptych! ! ! , who developed it and they are extending the service to the major distributors as well as wholesalers. Their software is also included in a number of the book shop computer systems. It is generally expected that the BUYLINE system will migrate to EDIFACT and use Internet rather than direct dial up communications in due course.

A further development is the regular monthly production of multimedia CD-ROM stock catalogues by major European wholesalers. Most of these allow users to build order files and output them in EDI formats, normally using direct dial-up. It is anticipated that data compression and increased bandwidth will soon allow these facilities to be available over Internet. An important point, however, is that BIC in the UK and EDItEUR in Europe have managed to produce a consensus on the book trade implementation of the messages that ensures that all recent services use standard message formats."

BIC feel that trials of standard forms freely available over the Internet, outputting EDIFACT messages to any trading partner able to receive them, would be very helpful.

Applying XML/EDI to Book Ordering

The HTML form shown in Figure A.1 has been designed for input of an order for up to two different books using the EDItEUR Book Ordering Message. The values entered into the fields on the form are the values used in the example EDI message provided for the form in the EDItEUR EDI Implementation Guidelines for Book Trade Distribution.

EDItEUR Lite EDI Book Order

Figure A.1: HTML form for capturing EDItEUR Lite-EDI Book Order Messages

Figure A.2 shows how XML could be used to code a form whose appearance would be equivalent to the HTML form shown in Figure A.1.

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE Book-Order PUBLIC "-//EDItEUR//DTD Book Order Message//EN">
<Book-Order Supplier="4012345000094" Send-to="http://www.bic.org/order.in">
<rules-template HREF="http://www.bic.org.uk/XML/EDI/Rules/orders.html">
<title>EDItEUR Lite-EDI Book Ordering</title>
<Order-Line Reference-No="0528837">
<Author-Title>Labaln, Brian/Chrome</Author-Title>
<Order-Line Reference-No="0528838">
<Author-Title>Parry, Linda (ed)/William Morris</Author-Title>
<input type="checkbox" name="partial" value="allowed"/>
<text>Tick here if a delayed/partial supply of order is acceptable
<input type="checkbox" name="confirmation" value="requested"/>
<text>Tick here if Confirmation of Acceptance of Order is to be returned by e-mail
<input type="checkbox" name="DeliveryNote" value="required"/>
<text>Tick here if e-mail Delivery Note is required to confirm details of delivery
<E-Address>E-mail address: <input name="e-address" size="25"></input>
<Language>Please respond in:
<select name="response-language">
 <option value="EN" selected>English</option>
 <option value="FR">Fran&ccedil;ais</option>
 <option value="DE">Deutsch</option>
 <option value="ES">Espagnol</option>
 <option value="IT">Italiano</option></select></language>
<input type="submit" value="Press here to send completed form to supplier">

Figure A.2: XML encoding of Book Order Message

A typical reaction to seeing such a form is "Where has all the EDI information gone?". The answer is that all immutable information goes into the document type definition (DTD) referenced in the <!DOCTYPE statement that starts the coding. Figure A.3 shows the contents of this DTD. A single line reference to this DTD is sufficient to provide the browser with all the additional information it needs to process the message.

Note how the definition of each element defined in Figure A.3 contains attributes whose fixed values contain the prefixes and suffixes of each of the EDIFACT fields that need to be generated in response to the messages.

The message format generated from the completed form could be a pure EDIFACT message of the type shown on Page II-2-2 of the EDItEUR EDI Implementation Guidelines for Book Trade Distribution.

<!DOCTYPE Book-Order [
<!--XML-conformant DTD for EDItEUR Book Order Message.
 Version 1.0 - Created 1st July 1997 by M. Bryan from The SGML Centre
 This DTD should be referenced using the following public identifier:
 PUBLIC "-//EDItEUR//DTD Book Order Message//EN"
  <!--Entities referenced within DTD-->
<!--Support information elements are designed to supply information
 that can be used to control the processing of the message.-->
<!ENTITY % support-info "(E-Address|Language|text|input|selec)*" >
<!--Entities used to datatype attribute values-->
<!--Uniform Resource Locator identifier. Contents of attribute must provide
 a valid HTTP or MAILTO address conforming to IETF RFC 822-->
<!--EAN location code. Number that uniquely identifies
 suppliers/purchasers. -->
<!--Formal EDIFACT definition of datatype. May be used by EDI-compliant
 browsers to validate the data entered by users prior to acceptance when
 a user attempts to move to another field. -->
<!ENTITY % EDItype "NAME" >
 <!--Message Content element declarations-->
<!--Book Order element:
 Purpose: Container for message fields and support information.
 Attributes: EDI-Prefix formally identifies type of message.
 EDI-Suffix contains strings to be output at end of message.
 Send-to identifies Uniform Reference Locator (URL) for site
 to which EDIFACT message is to be sent for processing.
 Supplier contains unique EAN that identifies supplier.
<!ELEMENT Book-Order (rules-template?, title?, Order-No, Message-Date,
                      Buyer-EAN, Order-Line+, %support-info;) >
<!ATTLIST Book-Order
 EDI-Suffix CDATA #FIXED "UNS+S'CNT+2:2'UNT+18+ME00579"
 Send-to %URL; #REQUIRED
 Supplier %EAN; #REQUIRED >
<!--Rules-template element:
 Purpose: To indicate which set of rules should be used to process
 the component parts of the message
<!ELEMENT rules-template EMPTY>
<!ATTLIST rules-template
 ROLE     CDATA "xml/edi-rules"
<!--Title element:
 Purpose: Used to provide supplier dependent title for form:
 Title can be displayed in window header or at top of form,
 or in both locations
<!ELEMENT title (#PCDATA) >
<!--Order Number element:
 Purpose: Allows users to assign unique number to their order.
 Attributes: EDI-Prefix formally identifies type of message.
 Datatype identifies format that contents must conform to.
 Size indicates width of box to be used to capture input.
 Title indicates text to precede box.
<!ELEMENT Order-No (#PCDATA) >
<!ATTLIST Order-No
 EDI-Prefix CDATA #FIXED "BGM+220+"
 Datatype %EDItype; #FIXED "C8"
 Title CDATA "Book Order No:" >
<!--Message Date element:
 Purpose: To indicate date order was placed. Date must be entered
 in ISO 8601 format without separators, e.g. CCYYMMDD
 Attributes: EDI-Prefix formally identifies type of message.
 EDI-Suffix identifies data to immediately follow contents.
 Datatype identifies format that contents must conform to.
 Size indicates width of box to be used to capture input.
 Title indicates text to precede input field.
 Comment contains explanatory text to follow the input field.
<!ELEMENT Message-Date (#PCDATA) >
<!ATTLIST Message-Date
 EDI-Prefix CDATA #FIXED "DTM+137+"
 EDI-Suffix CDATA #FIXED ":102"
 Datatype %EDItype; #FIXED "Date"
 Size NUMBER #FIXED "12"
 Title CDATA "Message Date:"
 Comment CDATA "Enter dates in CCYYMMDD format" >
<!--Buyer EAN identifier element:
 Purpose: To identify the unique EAN assigned to the purchaser.
 Attributes: EDI-Prefix formally identifies type of message.
 EDI-Suffix identifies data to immediately follow contents.
 Datatype identifies format that contents must conform to.
 Size indicates width of box to be used to capture input.
 Title indicates text to precede box.
 EDI-Suffix CDATA #FIXED "::9"
 Datatype %EDItype; #FIXED "C13"
 Size NUMBER #FIXED "13"
 Title CDATA "Buyer EAN:" >
<!--Order line element:
 Purpose: Container for objects used to order book.
 Attributes: EDI-Prefix formally identifies type of message.
 Line-no is calculated by system to be 1 + number of
 preceding order lines within file.
 Ref-Prefix identifies EDI prefix for reference number
 Reference-no uniquely identifies each line.
 Number is supplied by supplier's system with input file.
<!ELEMENT Order-Line (ISBN, Author-Title, Quantity) >
<!ATTLIST Order-Line
 Ref-Prefix CDATA #FIXED "#RFF+LI:"
 Reference-No NUMBER #REQUIRED >
<!--ISBN element:
 Purpose: To enter unique ISBN of book to be ordered
 Attributes: EDI-Prefix formally identifies type of message.
 EDI-Suffix identifies data to immediately follow contents.
 Datatype identifies format that contents must conform to.
 Size indicates width of box to be used to capture input.
 Title indicates text to precede box.
 Datatype %EDItype; #FIXED "N12"
 Size NUMBER #FIXED "12"
 Title CDATA "ISBN:" >
<!--Author and Title element:
 Purpose: Optional statement of author and title details to confirm
 correct ISBN has been entered.
 Attributes: EDI-Prefix formally identifies type of message.
 Datatype identifies format that contents must conform to.
 Size indicates width of box to be used to capture input.
 Title indicates text to precede box.
<!ELEMENT Author-Title (#PCDATA) >
<!ATTLIST Author-Title
 Datatype %EDItype; #FIXED "C60"
 Size NUMBER #FIXED "40"
 Title CDATA "Author/Title:" >
<!--Quantity element:
 Purpose: To identify the number of copies required.
 Attributes: EDI-Prefix formally identifies type of message.
 Datatype identifies format that contents must conform to.
 Size indicates width of box to be used to capture input.
 Title indicates text to precede box.
<!ELEMENT Quantity (#PCDATA) >
<!ATTLIST Quantity
 Datatype %EDItype; #FIXED "N2"
 Title CDATA "Quantity:" >
 <!--Declarations for message control support elements-->
<!--Electronic Address element:
 Purpose: To capture electronic address to which messages from the
 supplier to the buyer can be sent.
<!ELEMENT E-Address (#PCDATA|input)* >
<!--Language element:
 Purpose: Container linking text to selection menu.
<!ELEMENT Language (#PCDATA|select)* >
<!--Text element:
 Purpose: Temporary element required because HTML input has no
 equivalent of the title element.
<!ELEMENT text (#PCDATA) >
<!--Input, select and option elements:
 Purpose: As per HTML (temporarily borrowed element).
 Attributes: As per HTML (temporarily borrowed attributes).
<!ENTITY % InputType
<!ELEMENT input (#PCDATA) >
<!ATTLIST input
 type %InputType; "TEXT"
 checked (checked) #IMPLIED
 maxlength NUMBER #IMPLIED
 align (top|middle|bottom|left|right) top >
<!ELEMENT select (option+)>
<!ATTLIST select
 multiple (multiple) #IMPLIED >
<!ELEMENT option (#PCDATA) >
<!ATTLIST option
 selected (selected) #IMPLIED
<!ENTITY % ISOlat1 SYSTEM "http://www.myco.org/public/entities/ISOlat1.ent" >
<!ENTITY % ISOnum SYSTEM "http://www.myco.org/public/entities/ISOnum.ent" >
%ISOlat1; %ISOnum;

Figure A.3: XML Document Type Definition for Lite-EDI Book Order

The rules file associated with this document could take the following form if expressed using DSSSL flow objects rather than the HTML set:

... ECMAScript description of required functions and variables to be added here ...
function BookOrderValidationCheck
function ISO8601DateCheck (contents)
var message-date Date.parse(contents),
    received-date new Date.prototype.toString();
function CheckEAN
function GetReferenceNo
function CheckIfTicked
function OutputIfYes{...}
function OutputifNo{...}
  <EMBEDDED-TEXT USE="EnteredData">
    <SELECT ANCESTOR="Order-Line">
</DEFINE-MACRO NAME="DisplayTickBox">
        <TABLE-CELL USE="DefaultStyle">Book Order No:</TABLE-CELL>
        <TABLE-CELL USE="EnteredData"><CONTENTS/></TABLE-CELL>
  <TARGET-ELEMENT TYPE="Message-Date"/> 
      <TABLE-CELL USE="DefaultStyle">Message Date:</TABLE-CELL>
      <TABLE-CELL USE="EnteredData">message-date</TABLE-CELL>
      <TABLE-CELL USE="DefaultStyle">Enter date in CCYYMMDD format</TABLE-CELL>
      <TABLE-CELL USE="DefaultStyle">Buyer EAN:</TABLE-CELL>
      <TABLE-CELL USE="DefaultStyle">Supplier EAN:</TABLE-CELL>
      <TABLE-CELL USE="EnteredData">4012345000091</TABLE-CELL>
  <ELEMENT TYPE="Order-Line">
  <ELEMENT TYPE="Order-Line">
    <TARGET-ELEMENT TYPE="Author-Title"/>
    <TABLE-CELL USE="DefaultStyle">Author/Title:</TABLE-CELL>
  <ELEMENT TYPE="Order-Line">
    <TARGET-ELEMENT TYPE="Quantity"/>
    <TABLE-CELL USE="DefaultStyle">Quantity:</TABLE-CELL>
    <TABLE-CELL USE="DefaultStyle">Order line reference number:
       <INVOKE MACRO="GetOrderLineNo"/>
  <EVAL>(...If no more order lines output </TABLE> after end tag ....)
    <ATTRIBUTE NAME="type" VALUE="checkbox"/>
  <INVOKE MACRO="DisplayTickBox"/>
    <ATTRIBUTE NAME="name" VALUE="e-address"/>
  <EMBEDDED-TEXT USE="EnteredData">
   <TARGET-ELEMENT TYPE="Language"/>
  <ELEMENT TYPE="Language">
    <TARGET-ELEMENT TYPE="select"/>
    <ATTRIBUTE NAME="name" VALUE="submit">
  <BOX USE="RaisedButton">

Figure A.4: XML/EDI Processing Rules for Lite EDI Book Order

Note: The examples in the above figure cannot be completed until a complete list of XSL functions is available.


DataBots - XML/EDI Data Manipulation Agent (a.k.a. "Bot" is a software term for a component that acts as an Agent).

XML/EDI-R - the combination of XML message syntax and rule based EDI.


To be developed