[This local archive copy mirrored from the canonical site: http://www.docuverse.com/xlf/NOTE-XLF-19980721-all.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
Note: This draft is for review by XLF mailing list members.
Lisa Rein, finetuning.com
Gavin Nicol, Inso EPS
Don Park, Docuverse
This document is a product of the member of the XLF mailing list. We will update this draft specification on a regular basis.
Please send detailed comments on this document to xlf-owner@cybercom.net . We cannot guarantee a personal response but we will try when it is appropriate.
XLF (Extensible Log Format) is a set of DTD fragments, recommendations and API's intended to provide a complete, open, interoperable, and extensible logging infrastructure. (ED: Need more language here)
English, OMG IDL, Java, XML
The list of known errors in this document is found at http://www.docuverse.com/xlf//xlf-errata.html
.
Logging is part of almost every modern operating system in one form or another. Logs are used for tracking events that occur at runtime, and are often later used in analysis of system performance, security breaches, and access patterns, to name but a few.
This specification defines the Extensible Log Format, an XML based format for log data that is intended to make log information smarter, and easier to work with than ever before. A brief analysis of the need for XLF follows.
Currently, administrators use any number of methods to derive information from their server logs: usually in the form of custom-built scripts. In doing so, the scripts take "dumb" data, and extract "intelligent" results from it (this is often talked about as "adding intelligence to data").
However, logs also have the potential of holding "intelligent" data, such that far more and better information can be logged that is currently possible. Intelligent log data combined with intelligent processing will lead to far more powerful analysis and reporting capabilities than ever before.
For example: a wealth of information can be (and often is!)
obtained from HTTP server logs. Common usages today include
counting the number of hits, deriving access patterns, and
finding what percentage of downloads have broken before they were
completed. Often, deep analysis of HTTP log data requires that
sophisticated heurisitics be applied to the data. For example,
deriving access patterns necessitates analysis of the access
times and hostid
of every log entry. With more
intelligent data, such heuristics would be unnecessary.
Another example might be electronic commerce: a transaction
page that is written in XML (say the order page from amazon.com
)
might have its <total.price>
, <customer.name>
and <customer.address>
, and other pieces of
information logged, especially if that information can then
entered into a database automatically. XLF could play a key role
in defining the model for distributed data-driven processes on
the Web.
One major problem alluded to earlier is that most tools for processing log data are custom built. This is partly due to the differing requirements for analysis, but is also certainly partly due to the myriad log formats found today. A single extensible log format, with a single syntax (XML), will at least result in a common infrastructure upon which log analysis tools can be built.
In addition, the log format could help coordinate distributed
systems: the types of messages sent to a log are similar to those
used to coordinate processes. Administration is another potential
area where the format could play a role: SNMP and other such
protocols involve exchange of messages similar to those sent to
logs.
Work on XLF began based upon the conviction that the needs outlined in the introduction are real, and need to be addressed. To that end, the objectives of XML are:
This section defines a number of DTD fragments that can be used to build an application-specific log file format. It also defines recommended log file formats for more common log files, such as those for HTTP .
The fragments defined herein are all assumed to be defined within the XLF namespace
As with most things in software design, XLF is faced with a number of (at times) contradictory requirements. This section discusses some of the more important areas that affected the design of the XLF DTD fragments.
It is true that XML is more verbose than "normal" log formats in the first glance. However, in a typical log format, each log entry must standalone and thus specify all fields of an entry. Logs in XML can use inheritance so that repetitive information need not be duplicated. In addition, simple hyperlinks (ID/IDREF pairs) can be used to allow grouping of log information.
Log size is also not a major concern if we assume XLF to be a data interchange format rather than data storage format. For example, XLF may be used by log producers to send log information to log servers which host log filters and consumers, some of which could be smart enough to strip out unnecessary information according to administrator preference and compress them once a day or once a hour.
In addition, even as a raw storage format, compressed XML data generally takes up less than 20% more space than data formatted in a less verbose form. This is due to the great amount of redundancy in an XML file that can be compressed efficiently. This redundancy has a positive effect of making an XLF log more robust in the face of data corruption.
One possible result of XLF will be the creation of a Log
Server market. Server products inevitably generate logs but
most companies can not afford to dedicate significant resources
to log management and analysis, even though most administrators
rely heavily on logs to keep tabs on systems. The Log Server
market could be created with a standard Log Service Framework
which allows plug-and-play log producers, filters, and consumers.
Server companies benefit because they will be able to license
quality log servers rather than having to build them. Network
administrators will benefit because they will not have to write
custom scripts anymore.
One problem with using XML as a log format is that log events are often asynchronous, while XML documents are not. For example, is is often the case that the start and end of a system action can be interspersed with start and end events from other parts of the system. If the start and end correspond to XML start and end tags respectively, the generated log will not be well-formed XML, or if it is, it will be semantically incorrect
<xlf:event id="ID-1447410289"> ... <xlf:event id="ID-1980373498"> ... </xlf:event> <!-- id="ID-1447410289" --> </xlf:event> <!-- id="ID-1980373498" -->
There are a number of possible solutions to this problem, one of which is to model Log Events. In this case, the all events are modelled as discrete elements that carry a session identifier along with them. Using the session identifier allows one to later process the log file to create more structured logs. The example above would look like the following using this method.
<xlf:start id="ID-1447410289"/> ... <xlf:start id="ID-1980373498"/> ... <xlf:end id="ID-1447410289"/> <xlf:end id="ID-1980373498"/>
In general, this is not much better than existing log formats, except for the unification of syntax, so the XLF specification defines structured fragments. There are two primary assumptions behind this decision:
In order to provide a certain degree of flexibility in the
fragment reuse, a means of aliasing elements is provided (much in
the spirit of architectural forms). XLF uses the xlf:fragment
attribute on an element to decide what type of fragment it is,
not the element name. For example:
<xlf:resource filename="spec.txt"/>
is equivalent to
<file xlf:fragment="xlf:resource" filename="spec.txt"/>
This is accomplished by having each defined DTD fragment
define a #FIXED
attribute of type xlf:fragment
,
so that if the fragments are used verbatim, they have the
attribute declared. This same technique can be used to obviate
the need to specify the attribute value in an application
specific log format too:
<!DOCTYPE log [ <!ATTLIST file xlf:fragment CDATA "xlf:resource" #FIXED> ]> <log> <file filename="spec.txt"/> </log>
This example is exactly equivalent to the earlier example
using <file>
.
<!ELEMENT xlf:timebase EMPTY> <!ATTLIST xlf:timebase xlf:fragment CDATA "xlf:timebase" id CDATA #REQUIRED zone CDATA #REQUIRED year NUTOKEN #REQUIRED month NUTOKEN #REQUIRED day NUTOKEN #REQUIRED hour NUTOKEN #REQUIRED minute NUTOKEN #REQUIRED second NUTOKEN #REQUIRED tick NUTOKEN #REQUIRED tps NUTOKEN #REQUIRED>
The xlf:timebase
is used to declare the base time
for the system, and should occur as one of the first parts of a
log file. In a log format that includes timebase, other elements
can simply use ticks as the unit of measurement:
<download file="spec.txt" tick="21353221"/>
The attributes on xlf:timebase
have the following
meanings:
Name Description xlf:fragment This is a #FIXED
attribute that provides the basis for element renaming for this fragment.id TBD zone TBD year TBD month TBD day TBD hour TBD minute TBD second TBD tick TBD tps TBD
(ED: TBD)