Technology: RDL Specifications

[Cache from http://www.e-Numerate.com/technologyRDLspecs.htm; please use this canonical URL/source if possible.]

RDL™ SPECIFICATIONS

This defines the Annotated RDL™ Document type ("DTD"). The first column is text from the actual DTD file. The second column contains general explanation, with more details provided through hyperlinks.

The first column is text from the actual DTD file. The second column contains general explanation, with more details provided through the hyperlinks.

<?xml encoding="UTF-8"?>

All XML documents must start with this line. It tells the client application (in this case, the RDL™ Data Viewer), what type of document it is, and what version of XML.

"encoding=UTF-8" means that this document uses 8-byte Universal Text Format (essentially, ASCII). Different encodings have been developed to permit documents to be created in different character sets (such as Chinese). Currently, RDL™ supports only UTF-8.

Comments in XML begin with ""

<!ELEMENT rdldoc (rdldoc_header, line_item_set)>

An "rdldoc" consists of two objects: a header and a collection of data. The header contains overall document metadata (who prepared it, when it was made, etc.). The "line item set" roughly corresponds to the concept of a table in a database. It is a collection of "line items" (rows).

"rdldoc" is the root element. In the RDL™ document, all elements descend from this in a hierarchical fashion. The structure of this tree is what is being defined in this DTD. At the first level, an rdldoc tree has two branches: rdldoc_header and line_item_set.

<!-- Information about the rdldoc. An rdldoc consists of an rdldoc_header and a line_item_set. All of the line items in the line_item_set share a common data structure.

-->>

<!ELEMENT rdldoc_header (data_source?, formatting_source?, rdldoc_source?, license_terms?, linkset?)>

The header has five optional branches, each describing some aspect of the source. Note that even if you leave off all of these sub-elements, you still have some header information contained in the "attributes" of rdldoc_header. (See the ATTLIST below).

<!ATTLIST rdldoc_header

rdldoc_ID	CDATA #REQUIRED
doc_title	CDATA #REQUIRED
timestamp	CDATA #REQUIRED
version	CDATA #IMPLIED
expiration	CDATA #IMPLIED
freq_of_update	CDATA #IMPLIED
num_lineitems	CDATA #IMPLIED
num_datapoints	CDATA #IMPLIED
x_indexes	CDATA #IMPLIED
first_li_withdata	CDATA #IMPLIED>

Every element in a structured document can have two types of information attached to it: sub-elements and attributes.

There are no hard-and-fast rules regarding whether to put information in attributes or sub-elements; in general, RDL™ puts commonly used, global metadata about the element in attributes, and leaves distinct concepts to sub-elements.

<!ELEMENT data_source (contact_info+)>

<!ELEMENT formatting_source (contact_info+)>

<!ELEMENT rdldoc_source (contact_info+)>

<!ELEMENT license_terms (contact_info?, linkset?)>

<!ATTLIST license_terms

copyright_cite CDATA #REQUIRED

holder CDATA #REQUIRED

license_type CDATA #IMPLIED

warranty CDATA #IMPLIED

disclaimer CDATA #IMPLIED

terms CDATA #IMPLIED

date CDATA #IMPLIED

email CDATA #IMPLIED

state CDATA #IMPLIED

country CDATA #IMPLIED >

<!ELEMENT contact_info (#PCDATA)>

This element can be used by the target application to create an email letter, or update a contact list, or populate a database of information sources.

The same structure is used for all contact info sub-elements of the data_source, rdldoc_source, and formatting_source elements, so the application that created the document only has to create one structure for contact info.

<!ATTLIST contact_info

role CDATA #REQUIRED

name CDATA #IMPLIED

company CDATA #IMPLIED

address CDATA #IMPLIED

city CDATA #IMPLIED

state CDATA #IMPLIED

zip CDATA #IMPLIED

country CDATA #IMPLIED

email CDATA #IMPLIED

xlink:form CDATA #IMPLIED

href CDATA #IMPLIED

comments CDATA #IMPLIED >

<!ELEMENT linkset (link*)>

A linkset is a collection of hyperlinks. These hyperlinks may be either HTML files or RDL™ files. The individual link elements hold the actual links and their attributes (below).

<!ATTLIST linkset

xlink:form	CDATA #FIXED 'extended'
href	CDATA #IMPLIED >

These attributes designate the HTML or RDL™ page where a page of hyperlinks may be found. This is useful where you don't want to list all of the hyperlinks in the data document itself.

<!ELEMENT link (#PCDATA) >

The text portion of this element (that is, whatever text appears between the beginning and ending tag elements) is optional. If the title and href attributes are filled out they contain the basic information. If the title attribute is not present, the text component of the link can be used.

<!ATTLIST link

xlink:form	CDATA #FIXED 'simple'
href	CDATA #REQUIRED
behavior	CDATA #IMPLIED
content-role	CDATA #IMPLIED
content-title	CDATA #IMPLIED
role	CDATA #IMPLIED
title	CDATA #IMPLIED
show	CDATA #FIXED 'new'
actuate	CDATA #FIXED 'user' >

Hyperlinks in RDL™ follow the XLink standards.

<!ELEMENT line_item_set (data_x, li_class_set?, linkset?,
line_item+) >

A line_item_set is a collection of line items. It corresponds basically to a "table" in a traditional database, where the line items are rows. There is one data_x element in a line item set; it corresponds basically to a list of the field names in a traditional database.

<!ATTLIST line_item_set

line_item_set_type	CDATA #REQUIRED
time_period	CDATA #REQUIRED
character_set	CDATA #IMPLIED
missing_values	CDATA #IMPLIED
null_values	CDATA #IMPLIED
zero_values	CDATA #IMPLIED
dates_values	CDATA #IMPLIED
percentages	CDATA #IMPLIED >

<!ELEMENT data_x (#PCDATA) >

<!ATTLIST data_x

x_title	CDATA #REQUIRED
format	CDATA #REQUIRED
x_notes	CDATA #IMPLIED
x_desc	CDATA #IMPLIED
x_prec	CDATA #REQUIRED
x_unit	CDATA #REQUIRED
x_mag	CDATA #REQUIRED
x_mod	CDATA #REQUIRED
x_measure	CDATA #REQUIRED
x_scale	CDATA #REQUIRED
x_adjustment	CDATA #REQUIRED
x_links	CDATA #REQUIRED >

<!ELEMENT li_class_set (li_class+)>

<!ELEMENT li_class (#PCDATA)>

<!ATTLIST li_class

class_name	CDATA #REQUIRED
parent_class	CDATA #REQUIRED
xlink:form	CDATA #FIXED 'simple'
href	CDATA #IMPLIED
description	CDATA #IMPLIED >

<!ELEMENT line_item (data_x?, data_y, linkset?, note_set?) >

<!ATTLIST line_item

li_ID	CDATA #REQUIRED
li_legend	CDATA #REQUIRED
li_title	CDATA #REQUIRED
li_cat	CDATA #IMPLIED
y_axis_title	CDATA #REQUIRED
level	CDATA #REQUIRED
format	CDATA #REQUIRED
relation	CDATA #REQUIRED
li_notes	CDATA #REQUIRED
li_desc	CDATA #REQUIRED
li_prec	CDATA #REQUIRED
li_unit	CDATA #REQUIRED
li_mag	CDATA #REQUIRED
li_mod	CDATA #REQUIRED
li_measure	CDATA #REQUIRED
li_scale	CDATA #REQUIRED
li_adjustment	CDATA #REQUIRED
li_aggregation	CDATA #IMPLIED >

<!ELEMENT data_y (#PCDATA)>

<!ELEMENT analysis (linkset?)>

<!ELEMENT note_set (note+)>

<!ELEMENT note (#PCDATA)>

<!ATTLIST note
note_type CDATA #IMPLIED >

XML eXtensible Markup Language ("XML").

Top

RDL™ RDL™ is a fully compliant implementation of a markup language that conforms to the XML version 1.0 specification.

Top

DTD Document Type Definition. A DTD is a text file which provides a "template" for the structure of XML documents (of which RDL™ is a type). The DTD document specifies the structure of the target XML document by defining elements and their relationship to each other.
An element is denoted by "<" and ">" angle bracket characters. The first word in the angle brackets of the XML document is the element name. Elements begin and end with a set of angle brackets. Look for the first one to have a name and several attributes (e.g., "Color=blue"). The ending tag usually has a "/" character (e.g., "</bold>"). In between the element tags there is usually some form of text. This is the text that shows up on your screen in an HTML browser.
Top

encoding="UTF-8" Designates the text encoding.

Top

rdldoc_ID A unique identifier for this document. In almost all cases, this should be the fully qualified filename or URL for this file. (You can leave off the protocol. That is, the rdldoc_ID can be "www.e-numerate.com" rather than "http://www.e-numerate.com")

Top

doc_title The title that will appear at the top of reports, view windows for the document, etc. Should be a short (less than 100 characters) description of the document's data.

Top

timestamp Generated by the application that created the RDL™ document. The timestamp is in the form YYYY.MMDDHHMMSS. Note that it can apply to either the time that the document was created or the time the data was accessed for creation of the document.

Top

version A string (less than 255 characters) defined by the publisher of the document. Version naming policies are up to the creator of the document. Typical (and suggested) values are of the form "N.N.N.N".

Top

expiration The date and time that the data should no longer be relied on. Generally, this is the time that the next update is expected to be released. The expiration stamp is in the form YYYY.MMDDHHMMSS

Top

freq_of_update Designates the frequency with which the data is updated. Choices are: Year, Quarter, Month, Week, Day, Hour, Minute, Second. This is used by applications which would like to schedule updates to data.

Top

num_lineitems An integer describing the number of line items in the attached line_item_set. Note that this is optional (the receiving application can, after all, count). It is useful as a checksum, however.

Top

num_datapoints As with num_lineitems, this is optional, but useful for checking to make sure the line_item_set has not been accidentally changed or corrupted.

Top

x_indexes Numerator Lite™ uses this attribute to select the three data fields to use as representative data fields in the TreePanel reports. "x_indexes" is a comma-delimited string of three integers, each of which is an index to a selected field. Note that the indexes key off the END of the list of fields. So, for example, to show the last three fields in the tree, use x_indexes = "-3,-2,-1". Indexes based on the end were chosen because most people reading a timeseries will want to see the most recent data.

Top

first_li_withdata An integer index that identifies the line item that is to be displayed on the chart when the document is loaded in the RDL™ Data Viewer.

Top

copyright_cite This is the string that will appear on reports, etc. regarding ownership of the particular data set in the RDL™ data document. A typical example would be "Copyright 1999, e-Numerate Solutions Inc. All Rights Reserved."

Top

holder Full legal name of the owner of the copyright. e.g., "RDL, Inc."

Top

license_type Typical license types would be "None - Proprietary and Confidential", "Public Domain", "Free Use, Rights Reserved by Owner", "Pay per use", and so forth.

Top

warranty Most data preparers will not provide any warranty for their data sets or the data documents that contain data. Rather, the warranty item will be a limitation of liability on the part of the owner. Generally, this attribute will therefore be "No warranty is provided for this data document."

Top

disclaimer Most software and data providers will disclaim any liability for improper use of the data, or any responsibility for any use whatsoever. Typical disclaimer: "The provider of this information makes no representation or warranty of any type; the user accepts full responsibility for any use of this document."

Top

terms If there are any payment terms, length of use, or other terms, this is the place to put the notice.

Top

date Dates are strings in the form of "YYYY.MMDD".

Top

email Full email address of the copyright owner.

Top

state State in the USA. Two letter postal abbreviation.

Top

country Two or three letter abbreviation for the country of copyright ownership. This is important where countries have different copyright laws.

Top

role What role the party played in the creation of this document. Current possibilities are: data_source, rdldoc_source, and formatting_source.

Top

name Name of person to contact at the contact organization.

Top

company Company or person to contact at the contact organization.

Top

address Address of person to contact at the contact organization.

Top

city City of person to contact at the contact organization.

Top

comments Any particular information about the contact that might be useful to the user.

Top

behavior Reserved for XPointer use.

Top

content_role Reserved for XPointer use.

Top

content_title Reserved for XPointer use.

Top

role Reserved for XPointer use.

Top

title Reserved for XPointer use. This is the string that appears in the application as a hyperlink title. For example, in an HTML browser it will appear as highlighted, underlined text.

Top

show Reserved for XPointer use.

Top

actuate Reserved for XPointer use.

Top

line_item_set_type Currently, the RDL™ Data Viewer recognizes four different types of line items: "TimeSeries", "Category", and "XY". The "type" in this context is the characterization of the x axis values: do the values in the line items represent a time series, or a categorization (sometimes called a crosstabulation), or are they merely an XY scatterplot.

Top

time_period If the line items represent a time series, the valid period lengths are; "Year", "Quarter", "Month", "Week", "Day", "Hour", "Minute", and "Second".

Top

character_set Reserved for future use.

Top

missing_values Reserved for future use.

Top

null_values Reserved for future use.

Top

zero_values Reserved for future use.

Top

dates_values Reserved for future use.

Top

percentages Reserved for future use.

Top

x_title As the data is displayed in a chart, what title is displayed on the x axis.

Top

format A string providing a template for the default representation of the x axis values. The strings are those familiar from spreadsheet programs:
# - digit(s), zeros suppressed
0 - digit(s), zeros displayed
. - decimal point
, - separator
A - z, other characters - displayed literally
Top

x_notes Any footnotes regarding the x axis values.

Top

x_desc Any description regarding the x axis values.

Top

x_prec Number of significant digits for purposes of axis label display. Negative numbers cause rounding of amounts greater than zero. For example, a precision of "2" will display a number as "8,254.43". That same number with a precision of "-2" will be displayed as "8,300".
The underlying representation of the number will be the full value; only the formatting and representation on the screen will change. In the current RDL™ Data Viewer this is used primarily for formatting the axis labels.

Top

x_unit All numerical quantities have measurement units. "3" is just a designator for "three of something" unless you specify what that something is. That is, you could have "3 cars", or "3 boxes", "3 dollars", and so forth. The fundamental measurement dimension is the "unit".
As noted below, the unit by itself may not be sufficient to define the quantity. You may have "$ in thousands", or "feet per second", or "Per capita income ($), adjusted for inflation, 1995 = 100". Obviously, the whole concept of measurement can get complicated.

For RDL™, measurements break down into the following atomic pieces:

units * magnitude (modifier) measure * scale [adjustment]

Example: "$ in thousands per million people (inflation adjusted)" breaks down as follows:

x_unit = "$" (which may be further qualified as, say, "US$")
x_mag = "3" (ie., thousands are 10 to the 3rd power)
x_mod = "/" ("per" is the same as dividing)
x_scale = "person" (the singular of "people"; the RDL™ Data Viewer will convert)
x_measure = "6" (millions are 10 to the 6th power)
x_adjustment = "inflation adjusted, 1995 = 100" (Any special notes go here)

Obviously, following a standard vocabulary and spelling for units, measures, etc. is critical.
Top

x_mag The "magnitude" of the quantity; this is the multiplier found in the NUMERATOR of a quantity descriptor. For example, in the descriptor "Yen in Billions", the magnitude is "9" because a billion is 10 to the 9th power.
Magnitudes are expressed as numeric powers of 10 so that the application that reads it can make rapid transformations, and also so that the potential confusion of variant spellings and usages (million, mille, MM, etc.) is avoided.
Top

x_mod The modifier is expressed as a string that is associated with the division operator. For example, "per" in "$ per capita" means "$ amount / population".
Numerator Lite™ uses this attribute to contruct y axis labels and descriptors in reports when the user has made a transformation to the descriptor and the y_axis_label attribute is no longer appropriate.
Top

x_measure The measure can be thought of as the "units" in the denominator. Example: "Miles per Hour" is the same as "miles / hour", where "miles" is the x_unit, and "hour" is the measure. Note that measures can be associated with multipliers just as units can be. Whereas the multiplier in the numerator is called the "magnitude", in the denominator it is called the "scale".
Numerator Lite™ uses this attribute to contruct y axis labels and descriptors in reports when the user has made a transformation to the descriptor and the y_axis_label attribute is no longer appropriate.
Top

x_scale The scale is the multiplier in the denominator of the descriptor. It works the same as the magnitude: it is a string that expresses the power of 10 that should be multiplied by the x_measure.
Numerator Lite™ uses this attribute to contruct y axis labels and descriptors in reports when the user has made a transformation to the descriptor and the y_axis_label attribute is no longer appropriate.
Top

x_adjustment Any string that provides a special qualifier to the descriptor.

Top

x_links This can be a comma-delimited string of URLs.

Top

class_name This is a string of a "class" of data to which this x axis can belong. This attribute is used in advanced features of RDL™ such as macro transformations.

Top

parent_class A string designating the parent class. This attribute is used in advanced features of RDL™ such as macro transformations.

Top

li_ID A unique ID number for the line_item element. All line_item elements in the line_item_set are numbered from 0 to n (where n is the number of line_item elements). It must be unique and in order.

Top

li_legend A string describing the line item. In the RDL™ Data Viewer, the legend appears in the leftmost column of the tree views, in the chart legend, and in other places where the line item must be identified in plain language. It does not need to be unique.

Top

li_title A string defining the general subject of the line item. In the RDL™ Data Viewer, this is used as the title of the chart, and as titles in reports. Typically, titles are all the same for line items that are grouped together, but there are no requirements. The title should merely be selected on the basis of how clear it will make a chart to the user.

Top

li_cat Category. Not currently used in the chart or tree views of the RDL™ Data Viewer, this is and internal designator (plain language) of the subject matter of the line item or group of line items.

Top

y_axis_title A string (less than 50 characters) which will appear on the y axis as the title of that axis. If the user applies a transformation to any variable in the descriptor, this hard-coded y axis title will be replaced by one that is generated by the RDL™ Data Viewer.

Top

level When the set of line items is presented in a tree view, it will be possible to group them in a hierarchical tree, much as file information is presented in a file directory list. The user can expand and contract "nodes" of the tree to see greater or lesser amounts of detail.
This "level" attribute designates the number of indentations this particular line item should have relative to the root. So, for example, if a line item has a level "1", it will appear as a child of the root node. If it has a "2" level, it will appear as a child of the most recent "1" level, and so forth.
Top

relation The relation of this line item to its "parent" node in a hierarchical tree listing. (The parent node is the most recent node that is one level about the current line item.) By default, all line items have a relationship of "ChildStyle" to their parents, but there are other relation attribute values: CompPlus, CompMinus, CompTimes, CompDivide and so forth. For a complete listing and description of these, see the documentation for your RDL™ formatting application.
The different relation attributes are designed to allow the data publisher to designate different icons that appear to the left of the line item in the TreeView. These icons give visual clues to the user as to how each line item relates to its parent.
Top

li_notes Any string may be placed here to show footnotes. Generally, if the source of this line item is different from the overall data_source of the document, you will want to note that here.

Top

li_desc Any string that provides additional description regarding the line item. These descriptions tend to be less formal than the footnotes, as they appear in fewer reports.

Top

li_prec Number of significant digits for purposes of axis label display. Negative numbers cause rounding of amounts greater than zero. For example, a precision of "2" will display a number as "8,254.43". That same number with a precision of "-2" will be displayed as "8,300".
The underlying representation of the number will be the full value; only the formatting and representation on the screen will change. In the current RDL™ Data Viewer this is used primarily for formatting the axis labels.
Top

li_unit All numerical quantities have measurement units. "3" is just a designator for "three of something" unless you specify what that something is. That is, you could have "3 cars", or "3 boxes", "3 dollars", and so forth. The fundamental measurement dimension is the "unit".
As noted below, the unit by itself may not be sufficient to define the quantity. You may have "$ in thousands", or "feet per second", or "Per capita income ($), adjusted for inflation, 1995 = 100". Obviously, the whole concept of measurement can get complicated.

For RDL™, measurements break down into the following atomic pieces:

units * magnitude (modifier) measure * scale [adjustment]

Example: "$ in thousands per million people (inflation adjusted)" breaks down as follows:

x_unit = "$" (which may be further qualified as, say, "US$")
x_mag = "3" (ie., thousands are 10 to the 3rd power)
x_mod = "/" ("per" is the same as dividing)
x_scale = "person" (the singular of "people"; the RDL™ Data Viewer will convert)
x_measure = "6" (millions are 10 to the 6th power)
x_adjustment = "inflation adjusted, 1995 = 100" (Any special notes go here)

Obviously, following a standard vocabulary and spelling for units, measures, etc. is critical.
Top

li_mag The "magnitude" of the quantity; this is the multiplier found in the NUMERATOR of a quantity descriptor. For example, in the descriptor "Yen in Billions", the magnitude is "9" because a billion is 10 to the 9th power.
Magnitudes are expressed as numeric powers of 10 so that the application that reads it can make rapid transformations, and also so that the potential confusion of variant spellings and usages (million, mille, MM, etc.) is avoided.
Top

li_mod The modifier is expressed as a string that is associated with the division operator. For example, "per" in "$ per capita" means "$ amount / population".
Numerator Lite™ uses this attribute to contruct y axis labels and descriptors in reports when the user has made a transformation to the descriptor and the y_axis_label attribute is no longer appropriate.
Top

li_measure The measure can be thought of as the "units" in the denominator. Example: "Miles per Hour" is the same as "miles / hour", where "miles" is the x_unit, and "hour" is the measure. Note that measures can be associated with multipliers just as units can be. Whereas the multiplier in the numerator is called the "magnitude", in the denominator it is called the "scale".
Numerator Lite™ uses this attribute to contruct y axis labels and descriptors in reports when the user has made a transformation to the descriptor and the y_axis_label attribute is no longer appropriate.
Top

li_scale The scale is the multiplier in the denominator of the descriptor. It works the same as the magnitude: it is a string that expresses the power of 10 that should be multiplied by the li_measure.
Numerator Lite™ uses this attribute to contruct y axis labels and descriptors in reports when the user has made a transformation to the descriptor and the y_axis_label attribute is no longer appropriate.
Top

li_adjustment
Any string that provides a special qualifier to the descriptor.
Top

li_aggregation Occasionally, the user will want to "aggregate" or "deaggregate" data based on differing x axis transformations. This attribute explains to the RDL™ Data Viewer how to handle this particular line item when such transformations are being attempted.
Example: A line_item_set presents bank account information; each line item is a time series and presents quarterly data, and the user may wish to see the data on an annual basis. For some line items, that is a matter of simply summing up four quarters worth of data (e.g., deposits), but for other other line items (e.g., closing balance), you only want to show the last quarter's value.

Currently accepted values are: "sum", "average", "minimum", "maximum", "first", "last", "none".
Top

xlink:form
Under the XLink specification, hyperlinks may be "simple" links or "extended" links. Simple hyperlinks are the familiar "jump" links of HTML browsers: clicking on that link will close the current page and open the target page.
Top

href
The standard string for a URL ( i.e. http://www.e-numerate.com).
Top

"simple" links Traditional "jump" hyperlinks. Clicking on this list in the browser window will close the current page and open the target page.
Top

"extended" links
Reserved for future use.
Top