This document describes XML Encoding Rules (XER) as implemented by DSTC Pty Ltd. These rules describe how to encode ASN.1 specified data as XML documents. The rules are based on ideas and concepts proposed by the XER discussion list. Example XER output from our prototype implemention is provided.
We have adapted the existing ideas with modifications based on our experience of developing a practical implementation of the rules. It is highly desirable to have a set of consistent rules and a machine processable mechanism for deriving the XER encoding from an ASN.1 specification - without the need for human intervention to handle exceptional cases. The XER rules we propose here makes this possible.
The Z39.50 protocol is a powerful search protocol with a proven track record. It is widely deployed in the library community and there are many existing services using it. However, it's use of a heavyweight binary encoding makes it unpalatable to the Web and Internet community.
The Extensible Markup Language (XML) is a text oriented encoding developed by the W3C. It is rapidly gaining poplarity in the Web community and is being adopted as the underlying mechanism for many Web proposals - including efforts to develop a lightweight Web search protocol. However, the Z39.50 community has considerable experience in searching and it would be productive to leverage off that existing body of knowledge in developing a Web search protocol.
An XML Encoding Rule (XER) provides a mechanism for transporting Z39.50 (as well as general ASN.1 specified data) in a textual XML format. This combines the good points of XML and Z39.50 to provide a powerful Internet search protocol. It also provides a means of interfacing legacy ASN.1 based systems to the Internet and XML systems.
This document describes the mechanism for XER we have used in our prototype implementation. These rules have been derived from discussion documents by Alan Kent, Ralph LeVan and the XER mailing list. The additions we have made to these rules are described. Examples of XER encoding, generated by our prototype implementation, are also provided.
This document assumes the reader has a technical understanding of ASN.1, BER encoding, and XML. Knowledge of Z39.50 is also desirable for understanding the examples.
XER is intended to be used as a mechanism for interoperability between existing Z39.50 systems and Web/Internet systems. For this reason, we have placed great importance on interoperability: there should be a one-to-one translation between BER and XER encodings. In particular, it must be possible to:
The other requirement is that it must be a simple mechanism that is easily understood and easy to implement. This is very important if XER is to be adopted by the Internet community.
The goal of simplicity influences a number of areas in the design. For example, having a single encoding format rather than a number of alternatives, case sensitivity, reducing the need for look ahead. These factors make XER generators and parsers more easy to develop and more efficient to run.
The most significant design feature of this XER mechanism is to be able to automatically process ASN.1 to generate the XER encoding without human intervention. Requiring hand crafted mechanisms is error prone, as well as being both tedious and expensive. The set of rules described here can be automatically applied by a program. There are no exceptional cases or adjustment that require human intervention.
xer:
". For conciseness, the specific ASN.1 based tags
will be assumed to be in the default namespace (e.g. instead of
saying "<Z3950v3:initRequest>
" the examples will
say "<initRequest>
").
Note that the spelling and capitalization of the tags mimic that in the ASN.1 exactly, so there is no confusion or mapping rule needed. The only translation is if the ASN.1 keyword contains a space, then the space is converted into an underscore.
<xer:BOOLEAN>true</xer:BOOLEAN> <xer:BOOLEAN>false</xer:BOOLEAN>
<xer:BIT_STRING>101010</xer:BIT_STRING> <xer:BIT_STRING>111</xer:BIT_STRING> <xer:BIT_STRING> 10 1010</xer:BIT_STRING>
<xer:INTEGER>42</xer:INTEGER> <xer:INTEGER>-24601</xer:INTEGER> <xer:INTEGER>0</xer:INTEGER>
<xer:OBJECT_IDENTIFIER>1.2.840.10003.3.1</xer:OBJECT_IDENTIFIER> <xer:OBJECT_IDENTIFIER>1.2.3.4</xer:OBJECT_IDENTIFIER> <xer:OBJECT_IDENTIFIER>0</xer:OBJECT_IDENTIFIER>
<xer:OCTET_STRING>This is a test</xer:OCTET_STRING> <xer:OCTET_STRING>The <tag> & its friends</xer:OCTET_STRING> <xer:OCTET_STRING xer:enc="hex">5468697320697320612074657374</xer:OCTET_STRING>
<xer:NumericString>24601</xer:NumericString> <xer:PrintableSting>This is a test</xer:PrintableSting> <xer:GeneralString xer:enc="hex"> 64737463;</xer:GeneralString>
<xer:NULL/>
<xer:SEQUENCE><xer:INTEGER>1</xer:INTEGER><xer:INTEGER>2</xer:INTEGER></xer:SEQUENCE> <xer:SEQUENCE><xer:INTEGER>1</xer:INTEGER><xer:OCTET_STRING>foobar</xer:OCTET_STRING></xer:SEQUENCE> <xer:SEQUENCE/>
<xer:CHOICE><xer:INTEGER>1</xer:INTEGER></xer:CHOICE> <xer:CHOICE><xer:BOOLEAN>true</xer:BOOLEAN></xer:CHOICE> <xer:CHOICE><xer:OCTET_STRING>foobar</xer:OCTET_STRING></xer:CHOICE>
The tagging mechanism is ideally suited to the Basic Encoding Rules (BER) encoding mechanism, however, it is not desirable for XER. Firstly, the tag numbers do not convey any semantics to a human reader: using the tag numbers in XML would produce an encoding which is not very human readable. Secondly, the concept of IMPLICIT and EXPLICIT tags is complex and very low-level: it is a hinderance in a high-level XML encoding.
For XER encoding, we need an alternative identification mechanism to ASN.1's numeric tagging scheme. From an ASN.1 definition, the only source of identifiers with semantic meaning useful for human readers is the symbols used for type references and names. (ASN.1 specifications also contain human readable comments, but these are unstructured and hence useless for our purposes). The simple rules described here allow the type references and names to be used to replace the functionality of the ASN.1 numeric tags.
These two rules allows the semantics the ASN.1 specification writer to be used in the XER encoding. Writers (if they are sensible) will use meaningful type references. When types are used, names are often used to further describe how that instance is being used.
In practice, these two simple rules produce an XER encoding which is quite human readable. Some examples of real Z39.50 PDUs generated using these rules are shown later in this document. However, first we will illustrate these rules with some simple examples.
The simple ASN.1 production:
Height ::= [5] INTEGERwill be encoded in XER as:
<Height><xer:INTEGER>180</xer:INTEGER></Height>Note how the production identifier "Height" being used as the XML tag. Also, the ASN.1 numeric tag information is ignored.
The ASN.1 production:
PersonName ::= [16] IMPLICIT OCTET STRINGwill be encoded as:
<PersonName><xer:OCTET_STRING>Alice Brown</xer:OCTET_STRING></PersonName>
Again, the ASN.1 numeric tag is ignored.
The ASN.1 production:
Person ::= SEQUENCE { [1] OCTET STRING, [2] PersonName, [3] Height, [4] INTEGER OPTIONAL }will be encoded as:
<Person> <xer:OCTET_STRING>Ms</xer:OCTET_STRING> <PersonName><xer:OCTET_STRING>Alice Brown</xer:OCTET_STRING></PersonName> <Height><xer:INTEGER>180</xer:INTEGER></Height> <xer:INTEGER>180</xer:INTEGER> </Person>
The ASN.1 production:
Person ::= SEQUENCE { title [1] OCTET STRING, name [2] PersonName, height [3] Height OPTIONAL, weight [4] INTEGER OPTIONAL }will be encoded as:
<Person> <title>Ms</title> <name><xer:OCTET_STRING>Alice Brown</xer:OCTET_STRING></name> <height><xer:INTEGER>180</xer:INTEGER></height> <weight>180</weight> </Person>The names override the use of the type references. Names are commonly used in ASN.1 specifications, and this produces semantically rich XML tags.
The ASN.1 production:
AddressBookEntry ::= CHOICE { [10] Person, [11] Company, [12] Group }will be encoded as:
<AddressBookEntry> <Person> <title>Ms</title> <name><xer:OCTET_STRING>Alice Brown</xer:OCTET_STRING></name> <height><xer:INTEGER>180</xer:INTEGER></height> <weight>180</weight> </Person> </AddressBookEntry>
The above two rules cover most cases encountered in ASN.1, but since ASN.1 does not restrict or mandate how the type references and names are used, there may be situations where these rules do not allow us to generate a unique identifier tag.
For example, the following ASN.1 production would not have unique member tags and a decoder would not be able to determine which member of the CHOICE was being used. In a BER encoding, the ASN.1 numeric tags identify the member, but this information is discarded in the XER rules.
Dimensions ::= CHOICE { [100] INTEGER, [101] INTEGER, [102] INTEGER }To handle these cases, an additional third rule is required:
As an example of this third rule in action, if the original names generated from the first rule are:
In practice, application of the first two rules rarely produces
name clashes which require the third rule to be invoked. For
example, the Z39.50 version 3 specification has no clashes, and
the Z39.50 version 2 specification only has one (for the
RPNStructure
production).
If the name clash occurs within a SEQUENCE, then uniqueness may be implied by the position of the component in the sequence. However, this is complicated by whether the components are OPTIONAL or not. To take advantage of this implicit uniqueness a much more complex rule would be required. To keep things simple, this implicit uniqueness is not used.
The problem of not having access to the ASN.1 information also may arise when ASN.1 ANYs are being used.
To handle these situations, XER needs to be able to encode BER data in XML. The "xer:BER" tag is defined for this purpose. The tag contains a hexadecimal encoding of the BER encoding.
The use of BER encoding in XER causes some problems because some programs would need to support both XER and BER encoding. However, this is unavoidable since given a general BER encoding a BER to XER gateway would not be able to identify the data type of the BER element, because implicit tagging in ASN.1 may hide the type information.
The ASN.1 instantiation:
{ initRequest { protocolVersion '111'B, options '11'B, preferredMessageSize 256, maximumRecordSize 1048576, implementationName "Zebulun" } }The XER encoding will be:
<xer:BER xer:enc="hex">b41b8302 05e08402 06c08502 01008603 1000009f 6f075a65 62756c75 6e</xer:BER>
The following examples have been produced by a proxy application. This application was built using the Zebulun generated code. The Z39.50 version 3 ASN.1 description was used, and the proxy was set up to intercept PDUs between a Z39.50 origin and target (Isite). The captured PDUs are encoded in XER and printed out. These XER encodings are shown below (they have been indented to make them more readable).
A Z39.50 initRequest:
<PDU> <initRequest> <protocolVersion><xer:BIT_STRING>111</xer:BIT_STRING></protocolVersion> <options><xer:BIT_STRING>110000011000000</xer:BIT_STRING></options> <preferredMessageSize>1048576</preferredMessageSize> <exceptionalRecordSize>1048576</exceptionalRecordSize> </initRequest> </PDU>This is slightly different from the hand crafted examples proposed by the XER discussion mailing list: it includes an extra level of tagging for some members. However, this approach reflects the structure and semantic information in the ASN.1 specification. This extra information maybe helpful to the reader.
A Z39.50 initResponse:
<PDU> <initResponse> <protocolVersion><xer:BIT_STRING>111</xer:BIT_STRING></protocolVersion> <options><xer:BIT_STRING>110000011000000</xer:BIT_STRING></options> <preferredMessageSize>32768</preferredMessageSize> <exceptionalRecordSize>1048576</exceptionalRecordSize> <result>true</result> <implementationId><xer:GeneralString>34</xer:GeneralString></implementationId> <implementationName><xer:GeneralString>CNIDR zserver</xer:GeneralString></implementationName> <implementationVersion><xer:GeneralString>2.01c</xer:GeneralString></implementationVersion> </initResponse> </PDU>
A Z39.50 searchRequest:
<PDU> <searchRequest> <smallSetUpperBound>0</smallSetUpperBound> <largeSetLowerBound>1000000</largeSetLowerBound> <mediumSetPresentNumber>20</mediumSetPresentNumber> <replaceIndicator>true</replaceIndicator> <resultSetName><xer:GeneralString>Default</xer:GeneralString></resultSetName> <databaseNames> <DatabaseName> <InternationalString><xer:GeneralString>xxdefault</xer:GeneralString></InternationalString> </DatabaseName> </databaseNames> <query> <type-1> <attributeSet><xer:OBJECT_IDENTIFIER>1.2.840.10003.3.1</xer:OBJECT_IDENTIFIER></attributeSet> <rpn> <op> <attrTerm> <attributes><XER:seq/></attributes> <term><general>Kelvin</general></term> </attrTerm> </op> </rpn> </type-1> </query> </searchRequest> </PDU>
A Z39.50 close:
<PDU> <close> <closeReason><xer:INTEGER>0</xer:INTEGER></closeReason> </close> </PDU>
This should be feasible, but will need to be tested to see if it is possible and if there are issues it raises.
We have proposed a simple encoding of ASN.1 ANY data. However, further investigation is required.
The rules can be applied automatically, making the task of writing XER programs easier. This has been demonstrated by the success of our prototype, which generated the XER code from an ASN.1 specification without the need for manual intervention.