XER - A Bridge between ASN.1 and XML

Second draft
Dr Hoylen Sue
DSTC Pty Ltd
This document describes XML Encoding Rules (XER) as implemented by DSTC Pty Ltd. These rules describe how to encode ASN.1 specified data as XML documents. The rules are based on ideas and concepts proposed by the XER discussion list. Example XER output from our prototype implemention is provided.
We have adapted the existing ideas with modifications based on our experience of developing a practical implementation of the rules. It is highly desirable to have a set of consistent rules and a machine processable mechanism for deriving the XER encoding from an ASN.1 specification - without the need for human intervention to handle exceptional cases. The XER rules we propose here makes this possible.

Introduction

There is strong demand for a powerful and interoperable search protocol for the Internet. This demand comes from the Web community, wishing to add search features to the Web. It also comes from the information retrieval community, wishing to make its services available to users on the Internet.

The Z39.50 protocol is a powerful search protocol with a proven track record. It is widely deployed in the library community and there are many existing services using it. However, it's use of a heavyweight binary encoding makes it unpalatable to the Web and Internet community.

The Extensible Markup Language (XML) is a text oriented encoding developed by the W3C. It is rapidly gaining poplarity in the Web community and is being adopted as the underlying mechanism for many Web proposals - including efforts to develop a lightweight Web search protocol. However, the Z39.50 community has considerable experience in searching and it would be productive to leverage off that existing body of knowledge in developing a Web search protocol.

An XML Encoding Rule (XER) provides a mechanism for transporting Z39.50 (as well as general ASN.1 specified data) in a textual XML format. This combines the good points of XML and Z39.50 to provide a powerful Internet search protocol. It also provides a means of interfacing legacy ASN.1 based systems to the Internet and XML systems.

This document describes the mechanism for XER we have used in our prototype implementation. These rules have been derived from discussion documents by Alan Kent, Ralph LeVan and the XER mailing list. The additions we have made to these rules are described. Examples of XER encoding, generated by our prototype implementation, are also provided.

This document assumes the reader has a technical understanding of ASN.1, BER encoding, and XML. Knowledge of Z39.50 is also desirable for understanding the examples.

Requirements

A number of proposals have been made by the participants of the XER discussion list about XER. The XER proposal described here attempts to address those issues. However, we wish to highlight certain requirements which are important and has influenced the design of this proposal.

XER is intended to be used as a mechanism for interoperability between existing Z39.50 systems and Web/Internet systems. For this reason, we have placed great importance on interoperability: there should be a one-to-one translation between BER and XER encodings. In particular, it must be possible to:

In order to develop XER into a mechanism for encoding any ASN.1 specified data, the solution must address general ASN.1 and not just the subset used by Z39.50. It must handle any existing ASN.1 specification, and not require the ASN.1 descriptions to be rewritten for XER.

The other requirement is that it must be a simple mechanism that is easily understood and easy to implement. This is very important if XER is to be adopted by the Internet community.

The goal of simplicity influences a number of areas in the design. For example, having a single encoding format rather than a number of alternatives, case sensitivity, reducing the need for look ahead. These factors make XER generators and parsers more easy to develop and more efficient to run.

The most significant design feature of this XER mechanism is to be able to automatically process ASN.1 to generate the XER encoding without human intervention. Requiring hand crafted mechanisms is error prone, as well as being both tedious and expensive. The set of rules described here can be automatically applied by a program. There are no exceptional cases or adjustment that require human intervention.

Proposal

The XER mechanism will be described in four parts: namespaces, standard ASN.1 data types, ASN.1 productions, and BER encodings.

Namespaces

In an XER encoded document, XML tags and attributes will come from two sources: The standard XML namespace mechanism will be used. In the examples below, the generic XML tags will be prefixed with "xer:". For conciseness, the specific ASN.1 based tags will be assumed to be in the default namespace (e.g. instead of saying "<Z3950v3:initRequest>" the examples will say "<initRequest>").

Standard ASN.1 data types

The standard ASN.1 types are encoded in XML tags named after their type. These tags are defined in the XER namespace. The values will be textual representations of the value.

Note that the spelling and capitalization of the tags mimic that in the ASN.1 exactly, so there is no confusion or mapping rule needed. The only translation is if the ASN.1 keyword contains a space, then the space is converted into an underscore.

Boolean
BOOLEANs are encoded as the values of "true" or "false". These values are case sensitive, and must appear in all lowercase letters.
<xer:BOOLEAN>true</xer:BOOLEAN>
<xer:BOOLEAN>false</xer:BOOLEAN>
	

Bit String
BIT STRINGs are encoded as a series of "1" and "0" digits. The first digit corresponds to the zeroth bit. To simplify parsing and error checking, no other characters are allowed except whitespace which is ignored.
<xer:BIT_STRING>101010</xer:BIT_STRING>
<xer:BIT_STRING>111</xer:BIT_STRING>
<xer:BIT_STRING> 10 1010</xer:BIT_STRING>
	

Integer
INTEGERs are encoded in text as their decimal representation. To simplify parsing and error checking, no other characters are allowed within the value, and only ignorable whitespace may appear around the value.
<xer:INTEGER>42</xer:INTEGER>
<xer:INTEGER>-24601</xer:INTEGER>
<xer:INTEGER>0</xer:INTEGER>
	

Object Identifier
OBJECT IDENTIFIERs are encoded as decimal numbers in the order of the object identifier's components. These are separated by decimal points.
<xer:OBJECT_IDENTIFIER>1.2.840.10003.3.1</xer:OBJECT_IDENTIFIER>
<xer:OBJECT_IDENTIFIER>1.2.3.4</xer:OBJECT_IDENTIFIER>
<xer:OBJECT_IDENTIFIER>0</xer:OBJECT_IDENTIFIER>
	
Octet String
OCTET STRINGs can be unencoded or encoded in hexadecimal. Unencoded text may contain normal XML escape sequences. The encoding used is identified by an "enc" attribute. If this attribute is not present, it is unencoded. With hexadecimal encoding, any whitespace is ignored.
<xer:OCTET_STRING>This is a test</xer:OCTET_STRING>
<xer:OCTET_STRING>The &lt;tag&gt; &amp; its friends</xer:OCTET_STRING>
<xer:OCTET_STRING xer:enc="hex">5468697320697320612074657374</xer:OCTET_STRING>
	

Other String types
All other standard ASN.1 types based on the octet string type (i.e. NumericString, PrintableString, TeletextString, VideotexString, VisibleString, IA5String, GraphicString, and GeneralString) are encoded in the same way as an OCTET STRING with the tag the same as the type's name.
<xer:NumericString>24601</xer:NumericString>
<xer:PrintableSting>This is a test</xer:PrintableSting>
<xer:GeneralString xer:enc="hex"> 64737463;</xer:GeneralString>
	
Null
NULLs are encoded as an empty tag.
<xer:NULL/>
	
Sequence
SEQUENCEs and SEQUENCE OF are encoded as a "SEQUENCE" tag containing the encodings of the values of the sequence in order. Optional members are not included.
<xer:SEQUENCE><xer:INTEGER>1</xer:INTEGER><xer:INTEGER>2</xer:INTEGER></xer:SEQUENCE>
<xer:SEQUENCE><xer:INTEGER>1</xer:INTEGER><xer:OCTET_STRING>foobar</xer:OCTET_STRING></xer:SEQUENCE>
<xer:SEQUENCE/>
	
Choice
CHOICEs are encoded as the chosen value inside a "CHOICE" tag.
<xer:CHOICE><xer:INTEGER>1</xer:INTEGER></xer:CHOICE>
<xer:CHOICE><xer:BOOLEAN>true</xer:BOOLEAN></xer:CHOICE>
<xer:CHOICE><xer:OCTET_STRING>foobar</xer:OCTET_STRING></xer:CHOICE>
	
External
EXTERNALs are defined in terms of a production: a SEQUENCE of basic types. The production rules (described in the next section) are applied directly to encode EXTERNALs.

Any
The ASN.1 ANY type encoding is described in the "BER Encoding" section below.
This encoding mechanism allows generic ASN.1 to be encoded in XML. In practice, these standard types are not used on their own but as part of productions.

ASN.1 Productions

Productions are the mechanism in ASN.1 for building arbitrary data structures which gives ASN.1 its power. However, ASN.1 relies on a tagging mechanism to uniquely identify these structures and the components within them. Numbers are used as tags, and they can appear in IMPLICIT and EXPLICIT forms.

The tagging mechanism is ideally suited to the Basic Encoding Rules (BER) encoding mechanism, however, it is not desirable for XER. Firstly, the tag numbers do not convey any semantics to a human reader: using the tag numbers in XML would produce an encoding which is not very human readable. Secondly, the concept of IMPLICIT and EXPLICIT tags is complex and very low-level: it is a hinderance in a high-level XML encoding.

For XER encoding, we need an alternative identification mechanism to ASN.1's numeric tagging scheme. From an ASN.1 definition, the only source of identifiers with semantic meaning useful for human readers is the symbols used for type references and names. (ASN.1 specifications also contain human readable comments, but these are unstructured and hence useless for our purposes). The simple rules described here allow the type references and names to be used to replace the functionality of the ASN.1 numeric tags.

Rule 1: Productions are encoded as XML tagged data with the value of the production are encoded inside the production's XML tag. The value encoding follows the rules for the standard ASN.1 types as described above.

Rule 2: The name of the XML tag used is the production's type identifier. However, if the production is used in a context where it is given a name, then that name is used instead.

These two rules allows the semantics the ASN.1 specification writer to be used in the XER encoding. Writers (if they are sensible) will use meaningful type references. When types are used, names are often used to further describe how that instance is being used.

In practice, these two simple rules produce an XER encoding which is quite human readable. Some examples of real Z39.50 PDUs generated using these rules are shown later in this document. However, first we will illustrate these rules with some simple examples.

The simple ASN.1 production:

Height ::= [5] INTEGER
will be encoded in XER as:
<Height><xer:INTEGER>180</xer:INTEGER></Height>
Note how the production identifier "Height" being used as the XML tag. Also, the ASN.1 numeric tag information is ignored.

The ASN.1 production:

PersonName ::= [16] IMPLICIT OCTET STRING
will be encoded as:
<PersonName><xer:OCTET_STRING>Alice Brown</xer:OCTET_STRING></PersonName>

Again, the ASN.1 numeric tag is ignored.

The ASN.1 production:

Person ::= SEQUENCE {
  [1] OCTET STRING,
  [2] PersonName,
  [3] Height,
  [4] INTEGER OPTIONAL
}
    
will be encoded as:
<Person>
  <xer:OCTET_STRING>Ms</xer:OCTET_STRING>
  <PersonName><xer:OCTET_STRING>Alice Brown</xer:OCTET_STRING></PersonName>
  <Height><xer:INTEGER>180</xer:INTEGER></Height>
  <xer:INTEGER>180</xer:INTEGER>
</Person>

The ASN.1 production:

Person ::= SEQUENCE {
  title  [1] OCTET STRING,
  name   [2] PersonName,
  height [3] Height OPTIONAL,
  weight [4] INTEGER OPTIONAL
}
    
will be encoded as:
<Person>
  <title>Ms</title>
  <name><xer:OCTET_STRING>Alice Brown</xer:OCTET_STRING></name>
  <height><xer:INTEGER>180</xer:INTEGER></height>
  <weight>180</weight>
</Person>
The names override the use of the type references. Names are commonly used in ASN.1 specifications, and this produces semantically rich XML tags.

The ASN.1 production:

AddressBookEntry ::= CHOICE {
  [10] Person,
  [11] Company,
  [12] Group
}
will be encoded as:
<AddressBookEntry>
  <Person>
    <title>Ms</title>
    <name><xer:OCTET_STRING>Alice Brown</xer:OCTET_STRING></name>
    <height><xer:INTEGER>180</xer:INTEGER></height>
    <weight>180</weight>
  </Person>
</AddressBookEntry>

The above two rules cover most cases encountered in ASN.1, but since ASN.1 does not restrict or mandate how the type references and names are used, there may be situations where these rules do not allow us to generate a unique identifier tag.

For example, the following ASN.1 production would not have unique member tags and a decoder would not be able to determine which member of the CHOICE was being used. In a BER encoding, the ASN.1 numeric tags identify the member, but this information is discarded in the XER rules.

Dimensions ::= CHOICE {
  [100] INTEGER,
  [101] INTEGER,
  [102] INTEGER
}
    
To handle these cases, an additional third rule is required:

Rule 3: In the cases where the above rule does not generate a unique XML tag for every component inside a CHOICE or SEQUENCE, a furthur rule is required: those XML tags which are unique are used, those that are not unique are made unique by appending a number after its name to make it unique. Numbers are assigned in the order the components are listed in the ASN.1, starting from 1 and incremented to the next number which will make the name unique within that context. Each name which has classes are processed in order, and the number is reset to 1 for each one.

As an example of this third rule in action, if the original names generated from the first rule are:

will be mapped into:

In practice, application of the first two rules rarely produces name clashes which require the third rule to be invoked. For example, the Z39.50 version 3 specification has no clashes, and the Z39.50 version 2 specification only has one (for the RPNStructure production).

If the name clash occurs within a SEQUENCE, then uniqueness may be implied by the position of the component in the sequence. However, this is complicated by whether the components are OPTIONAL or not. To take advantage of this implicit uniqueness a much more complex rule would be required. To keep things simple, this implicit uniqueness is not used.

BER encodings

The mechanism described above for XER encoding works well, provided the program generating or parsing the XER has access to the ASN.1 specification describing the data. This is true for most programs. However, there are classes of programs that may not: generic gateways.

The problem of not having access to the ASN.1 information also may arise when ASN.1 ANYs are being used.

To handle these situations, XER needs to be able to encode BER data in XML. The "xer:BER" tag is defined for this purpose. The tag contains a hexadecimal encoding of the BER encoding.

The use of BER encoding in XER causes some problems because some programs would need to support both XER and BER encoding. However, this is unavoidable since given a general BER encoding a BER to XER gateway would not be able to identify the data type of the BER element, because implicit tagging in ASN.1 may hide the type information.

The ASN.1 instantiation:

{
  initRequest {
    protocolVersion '111'B,
    options '11'B,
    preferredMessageSize 256,
    maximumRecordSize 1048576,
    implementationName "Zebulun"
  }
}
    
The XER encoding will be:
<xer:BER xer:enc="hex">b41b8302 05e08402 06c08502 01008603 1000009f 6f075a65 62756c75 6e</xer:BER>
    

Examples

We have implemented the XER encoding rules in our Zebulun ASN.1 code generator. The code generator can take an ASN.1 description and generate a set of Java classes. The classes allow a developer to easily create programs which can encode and decode data into and from BER. After defining the rules for XER, the modification to generate XER code was not very difficult due to the modular structure of our code generator.

The following examples have been produced by a proxy application. This application was built using the Zebulun generated code. The Z39.50 version 3 ASN.1 description was used, and the proxy was set up to intercept PDUs between a Z39.50 origin and target (Isite). The captured PDUs are encoded in XER and printed out. These XER encodings are shown below (they have been indented to make them more readable).

A Z39.50 initRequest:

<PDU>
  <initRequest>
    <protocolVersion><xer:BIT_STRING>111</xer:BIT_STRING></protocolVersion>
    <options><xer:BIT_STRING>110000011000000</xer:BIT_STRING></options>
    <preferredMessageSize>1048576</preferredMessageSize>
    <exceptionalRecordSize>1048576</exceptionalRecordSize>
  </initRequest>
</PDU>
    
This is slightly different from the hand crafted examples proposed by the XER discussion mailing list: it includes an extra level of tagging for some members. However, this approach reflects the structure and semantic information in the ASN.1 specification. This extra information maybe helpful to the reader.

A Z39.50 initResponse:

<PDU>
  <initResponse>
    <protocolVersion><xer:BIT_STRING>111</xer:BIT_STRING></protocolVersion>
    <options><xer:BIT_STRING>110000011000000</xer:BIT_STRING></options>
    <preferredMessageSize>32768</preferredMessageSize>
    <exceptionalRecordSize>1048576</exceptionalRecordSize>
    <result>true</result>
    <implementationId><xer:GeneralString>34</xer:GeneralString></implementationId>
    <implementationName><xer:GeneralString>CNIDR zserver</xer:GeneralString></implementationName>
    <implementationVersion><xer:GeneralString>2.01c</xer:GeneralString></implementationVersion>
  </initResponse>
</PDU>
    

A Z39.50 searchRequest:

<PDU>
  <searchRequest>
    <smallSetUpperBound>0</smallSetUpperBound>
    <largeSetLowerBound>1000000</largeSetLowerBound>
    <mediumSetPresentNumber>20</mediumSetPresentNumber>
    <replaceIndicator>true</replaceIndicator>
    <resultSetName><xer:GeneralString>Default</xer:GeneralString></resultSetName>
    <databaseNames>
      <DatabaseName>
        <InternationalString><xer:GeneralString>xxdefault</xer:GeneralString></InternationalString>
      </DatabaseName>
    </databaseNames>
    <query>
      <type-1>
        <attributeSet><xer:OBJECT_IDENTIFIER>1.2.840.10003.3.1</xer:OBJECT_IDENTIFIER></attributeSet>
        <rpn>
          <op>
            <attrTerm>
              <attributes><XER:seq/></attributes>
              <term><general>Kelvin</general></term>
            </attrTerm>
          </op>
        </rpn>
      </type-1>
    </query>
  </searchRequest>
</PDU>
    

A Z39.50 close:

<PDU>
  <close>
    <closeReason><xer:INTEGER>0</xer:INTEGER></closeReason>
  </close>
</PDU>
    

Open Issues and Further Work

This section outlines some open issues and areas requiring further work.

Named values

Our current prototype does not generate textual named tags for named values in the ASN.1 specification. The consensus of theXER mailing list is that empty tags should be generated for these values.

This should be feasible, but will need to be tested to see if it is possible and if there are issues it raises.

Use of ASN.1 ANY and BER encodings

Further work needs to be done on the approach and issues raised by the handling of the ASN.1 ANY type. It is difficult to cater for data that is of an unknown type and do something useful with it. However, the ASN.1 ANY is used often in ASN.1 specifications, so the XER encoding must be able to handle it.

We have proposed a simple encoding of ASN.1 ANY data. However, further investigation is required.

DTD Generation

Since we have a set of well defined rules for generating the XER documents and these documents are XML documents, it would be useful to obtain a DTD (or an XML-schema of some form) for that document. An interesting line of research would involve creating a program which takes ASN.1 as input and produces the DTD for the XER encoding as the output.

Conclusions

The XML encoding rules (XER) described here are a simple but effective mechanism for encoding data from the ASN.1 world in XML. This allows the Internet and XML world to interoperate with the ASN.1 world (and in particular the Z39.50 community). It provides a bridge between new XML based systems and legacy systems.

The rules can be applied automatically, making the task of writing XER programs easier. This has been demonstrated by the success of our prototype, which generated the XER code from an ASN.1 specification without the need for manual intervention.

References

  1. XER Decisions and Proposals, http://asf.gils.net/xer/decisions.html
  2. XER discussion list, http://asf.gils.net/xer/
  3. Kent, Alan, XER Simple Set of Rules, http://asf.gils.net/xer/rules_v1.html
  4. LeVan, Ralph, XER Encoding Rules, http://www.oclc.org/~levan/docs/xerencodingrules.html
  5. Information Technology - Open Systems Interconnect - Specification of Abstract Syntax Notation One (ASN.1). ISO/IEC 8824/1990.
  6. Information Technology - Open Systems Interconnect - Specification of Basic Encoding Rules for Syntax Notation One (ASN.1). ISO/IEC 8825/1990.
  7. Extensible Markup Language (XML) 1.0, World Wide Web Consortium, REC-xml-19980210, http://www.w3.org/TR/REC-xml


DSTC Pty Ltd
Hoylen Sue
Last modified: Tue Jan 12 16:22:27 EST 1999