[This local archive copy mirrored from the canonical site: http://www.docuverse.com/xlf/NOTE-XLF-19980721-all.html; links may not have complete integrity, so use the canonical document at this URL if possible.]

NOTE-XLF-19980721


XLF: The Extensible Log Format

Version 1.0

NOTE-XLF-19980721

XLF Working Group

21 July, 1998

Note: This draft is for review by XLF mailing list members.

This version

http://www.docuverse.com/xlf//xlf-NOTE-XLF-19980721.html

Latest version

http://www.docuverse.com/xlf//xlf-NOTE-XLF-19980721.html

Editors

Lisa Rein, finetuning.com
Gavin Nicol, Inso EPS

Principal Contributors

Don Park, Docuverse


Status

This document is a product of the member of the XLF mailing list. We will update this draft specification on a regular basis.

Please send detailed comments on this document to xlf-owner@cybercom.net . We cannot guarantee a personal response but we will try when it is appropriate.


Abstract

XLF (Extensible Log Format) is a set of DTD fragments, recommendations and API's intended to provide a complete, open, interoperable, and extensible logging infrastructure. (ED: Need more language here)

Languages used

English, OMG IDL, Java, XML

The list of known errors in this document is found at http://www.docuverse.com/xlf//xlf-errata.html .


Table of Contents







1. Introduction

Logging is part of almost every modern operating system in one form or another. Logs are used for tracking events that occur at runtime, and are often later used in analysis of system performance, security breaches, and access patterns, to name but a few.

This specification defines the Extensible Log Format, an XML based format for log data that is intended to make log information smarter, and easier to work with than ever before. A brief analysis of the need for XLF follows.

1.1. Dumb data, smart scripts

Currently, administrators use any number of methods to derive information from their server logs: usually in the form of custom-built scripts. In doing so, the scripts take "dumb" data, and extract "intelligent" results from it (this is often talked about as "adding intelligence to data").

However, logs also have the potential of holding "intelligent" data, such that far more and better information can be logged that is currently possible. Intelligent log data combined with intelligent processing will lead to far more powerful analysis and reporting capabilities than ever before.

For example: a wealth of information can be (and often is!) obtained from HTTP server logs. Common usages today include counting the number of hits, deriving access patterns, and finding what percentage of downloads have broken before they were completed. Often, deep analysis of HTTP log data requires that sophisticated heurisitics be applied to the data. For example, deriving access patterns necessitates analysis of the access times and hostid of every log entry. With more intelligent data, such heuristics would be unnecessary.

Another example might be electronic commerce: a transaction page that is written in XML (say the order page from amazon.com) might have its <total.price>, <customer.name> and <customer.address>, and other pieces of information logged, especially if that information can then entered into a database automatically. XLF could play a key role in defining the model for distributed data-driven processes on the Web.

1.2. Server log interchange

One major problem alluded to earlier is that most tools for processing log data are custom built. This is partly due to the differing requirements for analysis, but is also certainly partly due to the myriad log formats found today. A single extensible log format, with a single syntax (XML), will at least result in a common infrastructure upon which log analysis tools can be built.

In addition, the log format could help coordinate distributed systems: the types of messages sent to a log are similar to those used to coordinate processes. Administration is another potential area where the format could play a role: SNMP and other such protocols involve exchange of messages similar to those sent to logs.

1.3. Objectives

Work on XLF began based upon the conviction that the needs outlined in the introduction are real, and need to be addressed. To that end, the objectives of XML are:

  1. To identify commonly occuring peices of information in log files, and to model them as elements and attribute types. These elements and attributes will form part the XMLF Core specification, which will be the basis for log file formats based on XLF.
  2. To define the models such that they can easily be included in other formats (open containment), using only XML (XLF will be an XML application. No extensions to XML will be required.
  3. To model a sufficient set of data that common internet server logs (HTTP, FTP, proxies, etc.) can be modelled in XLF. Other logs, such as Yamaguchi logs should also be considered.
  4. To provide guidelines on how to extend the specification to support specific logging applications (i.e. HTTP). Some recommendations for specific protocols might also be made.
  5. To, as far as possible, provide backward comptibility. For example, it should be possible to write log producer modules (like a plugin) that convert legacy log formats into XLF and feed it into XLF log service framework. Under a language like Java, it should even be possible to migrate XLF plugins (producers, filters, and consumers) from the server to the client on demand.
  6. To specify API's for adding data to a log and accessing data within a log.
  7. To provide a "proof of concept" implementation: Docuverse will be building an implementation in Java and make it available freely (and for free ;-) like Free-DOM, and several of the initiative members have already expressed interests in designing Logging Framework based on XLF.
  8. To encourage server companies as well as log analyzer companies to support the specification so that we will have a truely universal log format.





2. DTD Fragments

This section defines a number of DTD fragments that can be used to build an application-specific log file format. It also defines recommended log file formats for more common log files, such as those for HTTP .

The fragments defined herein are all assumed to be defined within the XLF namespace

2.1. Notes on fragment design

As with most things in software design, XLF is faced with a number of (at times) contradictory requirements. This section discusses some of the more important areas that affected the design of the XLF DTD fragments.

2.1.1. Verbosity

It is true that XML is more verbose than "normal" log formats in the first glance. However, in a typical log format, each log entry must standalone and thus specify all fields of an entry. Logs in XML can use inheritance so that repetitive information need not be duplicated. In addition, simple hyperlinks (ID/IDREF pairs) can be used to allow grouping of log information.

Log size is also not a major concern if we assume XLF to be a data interchange format rather than data storage format. For example, XLF may be used by log producers to send log information to log servers which host log filters and consumers, some of which could be smart enough to strip out unnecessary information according to administrator preference and compress them once a day or once a hour.

In addition, even as a raw storage format, compressed XML data generally takes up less than 20% more space than data formatted in a less verbose form. This is due to the great amount of redundancy in an XML file that can be compressed efficiently. This redundancy has a positive effect of making an XLF log more robust in the face of data corruption.

One possible result of XLF will be the creation of a Log Server market. Server products inevitably generate logs but most companies can not afford to dedicate significant resources to log management and analysis, even though most administrators rely heavily on logs to keep tabs on systems. The Log Server market could be created with a standard Log Service Framework which allows plug-and-play log producers, filters, and consumers. Server companies benefit because they will be able to license quality log servers rather than having to build them. Network administrators will benefit because they will not have to write custom scripts anymore.

2.1.2. Log Context

One problem with using XML as a log format is that log events are often asynchronous, while XML documents are not. For example, is is often the case that the start and end of a system action can be interspersed with start and end events from other parts of the system. If the start and end correspond to XML start and end tags respectively, the generated log will not be well-formed XML, or if it is, it will be semantically incorrect

<xlf:event id="ID-1447410289">
     ...
<xlf:event id="ID-1980373498">
     ...
</xlf:event> <!-- id="ID-1447410289" -->
</xlf:event> <!-- id="ID-1980373498" -->

There are a number of possible solutions to this problem, one of which is to model Log Events. In this case, the all events are modelled as discrete elements that carry a session identifier along with them. Using the session identifier allows one to later process the log file to create more structured logs. The example above would look like the following using this method.

<xlf:start id="ID-1447410289"/>
     ...
<xlf:start id="ID-1980373498"/>
     ...
<xlf:end id="ID-1447410289"/>
<xlf:end id="ID-1980373498"/>

In general, this is not much better than existing log formats, except for the unification of syntax, so the XLF specification defines structured fragments. There are two primary assumptions behind this decision:

  1. That something akin to a log server will exist.
  2. That XLF is primarily to be used for interchange, not for direct storage. In cases where asynchronous storage is required, data will most likely be stored in a binary format that can then be converted to XLF.


2.1.3. Element Renaming

In order to provide a certain degree of flexibility in the fragment reuse, a means of aliasing elements is provided (much in the spirit of architectural forms). XLF uses the xlf:fragment attribute on an element to decide what type of fragment it is, not the element name. For example:

<xlf:resource filename="spec.txt"/>

is equivalent to

<file xlf:fragment="xlf:resource" filename="spec.txt"/>

This is accomplished by having each defined DTD fragment define a #FIXED attribute of type xlf:fragment, so that if the fragments are used verbatim, they have the attribute declared. This same technique can be used to obviate the need to specify the attribute value in an application specific log format too:

<!DOCTYPE log [
<!ATTLIST file xlf:fragment CDATA "xlf:resource" #FIXED>
]>
<log>
<file filename="spec.txt"/>
</log>

This example is exactly equivalent to the earlier example using <file>.

2.2. Defined Fragments

(ED: Need some blurb here)

2.2.1. Declaration of time base

<!ELEMENT xlf:timebase EMPTY>
<!ATTLIST xlf:timebase 
                xlf:fragment      CDATA      "xlf:timebase"
	        id                CDATA      #REQUIRED
                zone              CDATA      #REQUIRED
                year              NUTOKEN    #REQUIRED
                month             NUTOKEN    #REQUIRED
                day               NUTOKEN    #REQUIRED
                hour              NUTOKEN    #REQUIRED
                minute            NUTOKEN    #REQUIRED
                second            NUTOKEN    #REQUIRED
                tick              NUTOKEN    #REQUIRED
                tps               NUTOKEN    #REQUIRED>

The xlf:timebase is used to declare the base time for the system, and should occur as one of the first parts of a log file. In a log format that includes timebase, other elements can simply use ticks as the unit of measurement:

<download file="spec.txt" tick="21353221"/>

The attributes on xlf:timebase have the following meanings:

Name Description
xlf:fragment This is a #FIXED attribute that provides the basis for element renaming for this fragment.
id TBD
zone TBD
year TBD
month TBD
day TBD
hour TBD
minute TBD
second TBD
tick TBD
tps TBD






3. Recommendation for HTTP Log Files

(ED: TBD)



Appendix A: Glossary

log
(ED: TBD)