DML Documents
Development Markup Language
Version 0.01.00
Commentary
1998-10-22
DML design goals
1. DML will support the markup
of information describing the development activities of organizations
working in the area of international development.
Information describing development
activities, which has been referred to as "process-related"
information, includes descriptions of projects, programs, loans and
credits.
2. DML will support the valid
markup of records that contain only the mandatory data elements described
by the current CEFDA standard.
CEFDA provides a basic level
of development activity description that has already been agreed upon
by a wide range of development organizations. Allowing valid markup
using only CEFDA mandatory elements will encourage development
organizations currently using CEFDA to adopt DML.
3. DML will support markup
that extends CEFDA in areas that have already been formally or informally
identified as requiring more complete, detailed or more extensive information.
A number of improvements or extensions
to CEFDA have been discussed within the INDIX community or implemented
within pilot projects such as the GK-AIMS project of Global Knowledge
Partners program. These areas include the use of authorities for institutional
names, authoritative description of sectors, increased detail in terms
of financial information, and better description of programs.
4. DML will allow for multilingual
markup.
Development organizations work
in a number of different languages. DML should support description in
a variety of languages.
5. DML will be developed quickly.
In order to avoid fragmentation
of the development community in supporting competing Document Type Definitions,
DML must be developed quickly. Version 0.01 of the DTD is a draft intended
to stimulate comment, suggestions and discussion.
6. DML should be easily usable
by a wide range of browser software, style sheets and other software.
Features of DML should be among
those most likely to be quickly implemented in XML parsers, renderers,
and stylesheets such as CSS and XSL. Reasonable use of the data should
be possible even without sophisticated application software.
7. DML will be easily implemented
by development agencies.
Development agencies should be
able to implement simple DML markup easily. For most agencies, this
will mean producing DML as output from another information system, rather
than marking up original text.
8. To the extent possible,
DML will be consistent with the other developing metadata schemes.
Other metadata schemes such as
the Dublin Core standard for describing document-like information objects
and the Government Information Locator Service (GILS) used for describing
government information resources continue to evolve. As the means of
integrating multiple metadata schemes (RDF, semantic maps) develops,
DML should develop to be consistent with these schemes.
9. DML will provide a base
on which to develop richer and more powerful means of exchanging, sharing
and using development activity information.
While a simple DML document should
not require more than the required CEFDA data elements, provision must
be made for the exchange of more complex and richer sets of data to
ensure that the markup language will continue to meet the needs of the
development community in the future.
Comments on the DTD
Naming Conventions
Element names consisting of more
than one word have been joined with an underscore (_). Element names
consisting of more than one word where some of the words represent a
hierarchy and where the hierarchy is reflected for clarity’s sake in
the name of the element have the different parts of the name joined
by a full stop (.).
%optlink - Optional Link Elements
In order to support both simple
exchange of information elements such as organization names, and to
support the use of networked sources of authoritative information such
as organizational authority files, some elements in the DML markup have
been defined to allow optional
linking. The optional linking element can carry linking information
if this is available to the organization creating the DML document.
Optional linking elements include organization information (to allow
for links to an organization authority file) and sector codes (to link
to an authoritative source for sector codes). Elements with few possible
values, such as Terms of Assistance, are not good candidates for optional
linking; instead coded values should be converted into human-readable
forms with stylesheets or specific application processors. Bibliographic
references to documents also can use optional linking to access more
complete descriptions.
Examples:
1. An organization has no means
of referencing an institutional authority file. The DML document contains
only the name of the agency.
<executing_org>
<org.name>
Canadian International Development
Agency
<org.name>
</executing_org>
2. An organization can and chooses
to provide a link to an networked authority file of development organizations,
located at DevOrgs International, a yet-to-be-established consortium
constituted to maintain information about development organizations.
Users can get additional information on that particular organization
by traversing the link (e.g. by clicking on the organization name).
<executing_org xml:link="simple"
href="http://www.devorgs.org/auth?CIDA"
show="embed">
<org.name>
Canadian International Development
Agency
</org.name>
</executing_org>
Traversing this link will embed
information retrieved from the site of DevOrgs into the record, so that
in fact the additional information available is embedded into the current
document. The information about the executing agency might then appear
as in the following example..
<executing_org xml:link="simple"
href="http://www.devorgs.org/auth?CIDA">
<org.name lang=“en”>
Canadian International Development
Agency
</org.name>
<org.acronym>
CIDA
</org.acronym>
<city>
Hull
</city>
<prov_state>
Quebec
</prov_state>
<country>
Canada
</country>
<contact.uri href=“http://www.acdi-cida.gc.ca/index.htm”>
http://www.acdi-cida.gc.ca/index.htm
</contact>
</executing_org>
Developments in XML processors
will need to be monitored for support for optional linking.
%addr Address Information
Address information may appear
for organizations in any one of the roles in which they may appear (i.e.
funding organization, cofunding organization, executing entity or reporting
organization). Address information may also be associated with a particular
contact. This entity allows the information to be standardized across
these different elements.
% contact Contact Information
Contact information may be associated
with organizations in any one of the different roles they play in relation
to a development activity. GK-AIMS and CEFDA take different approaches
to this kind of information. In GK-AIMS, contact information is subsidiary
to an organization unit, so that there is a fixed contact point for
that organization. This approach should render the notion of a CEFDA
Contact (210) redundant, since the Contact is associated with the Funding
Organization. However the contact for further documentation on a project
or program may not necessarily be the contact designated for the whole
organization in which case the CEFDA Contact would provide this role.
In this version of DML, flexibility built into the DTD allows for both
approaches, but at the cost of some increased complexity and potential
redundancy. This approach needs to be validated with development organizations.
The GILS standard provides for a Contact sub-element "Hours of
Service" which is not present in CEFDA or GK-AIMS, but which has
been added here (contact.hours).
%org Organization
Organizations may play a number
of different roles in relation to a development activity: that of funding
organization, cofunding organization, executing entity, funding source
or organization reporting the development activity. This entity allows
the format of organizational information to be standardized across these
different roles. Following the GK-AIMS format, a contact is allowed
for each organization/role combination.
%mixed Mixed Content
Some organizations may provide
extensive abstracts which can be separated into paragraphs, headings
or unnumbered lists. This mixed content allows text in these elements
to be marked up with some basic display-oriented markup to improve readability
and clarity of long texts.
activity Activity
<activity> is the main
element of the DML. Attributes include “language”, which corresponds
to the machine-usable version of the Language of Record data element,
and “id” which serves to provide a unique identification for this element
to permit linking of other resources to this resource. The “id” element
in particular is limited in terms of its form; it must begin with an
alphabetic character and contain only alphabetic characters or the symbols
‘.’ or ‘-’, i.e. full stop or hyphen. Given that documents may be assembled
with activities from different organizations, it would be wise to encourage
some standardization in the format of the activity id attribute; one
suggestions would be to use the agency-assigned activity identifier,
substituting ‘-’ for any non-alphabetic character, and prefixing this
identifier with the acronym for the agency itself followed by a ‘.’.
One can argue that there is little apparent need in the human-readable
elements for either what CEFDA terms “Record Identifier” or (perhaps
less convincingly) “Record Language”; both these elements have not been
included in this version of DML.
The Activity element contains
four elements which correspond to the four categories of fields specified
in CEFDA. These categories serve to group the different CEFDA fields,
but could be dispensed with in DML if the tags are not useful in formatting
record displays. Of the four categories, CEFDA places administrative
information first. However this administrative information is less important
to the user than the descriptive part of the record, which includes
critical information elements such information as title. Since XML documents
will frequently be processed in a one-pass, sequential fashion, the
most important information should appear first in the document. For
this reason, the administrative information in DML has been moved to
appear as the last element contained within <activity> instead
of the first as CEFDA would suggest.
title and trans_title
Title and Translated Title
Title and translated title could
be merged into a single field with the attribute "xml:lang"
used to distinguish between different language versions of the title.
However it is clear from the definition in CEFDA that a distinction
is being made between the title in one of the
official languages of the funding organization (hence "the
official title" or at least one of the official titles) and a title
which has been translated in order to make the information more widely
available or more easily understood. Both elements have been retained
in the Development Markup Language to continue this distinction.
Coded Values
Many of the fields in the CEFDA
format, such as Terms of Assistance, Type of Activity and Country/Region
were designed to carry coded values. Additional elements discussed but
never implemented include Sector. Coded values were used with these
data elements to provide a controlled vocabulary that would unambiguously
identify a particular value, and to provide a way to share information
as independently as possible of the original language of description.
The intention was always that
the coded value would be translated into one or more easily comprehended
formats in one or more languages when the information was being delivered
to the end user. For example, the language codes of the CEFDA format
are expanded into country/region names in the INDIX Development Activity
Information (DAI) database. DML is intended not only for computer-to-computer
transfer but also for end user display, where this information is readable
by the end user. A variety of approaches to this kind of data are possible.
1. The code is an attribute
of an empty element.
The XML application or XML stylesheet
must convert the attribute value into a human-readable and comprehensible
format, such as "grant".
Example:
<terms_assistance code="1"/>
2. The element includes a
readily understandable description of the value in a given language,
while still carrying the code as an attribute value.
Example:
<terms_assistance code ="1">Grant</terms_assistance>
In this case the language would
be assumed to be the language of the parent element (“language of record”
in CEFDA parlance), but even this could be specified:
<terms_assistance xml:lang="en"
code="1">Grant</terms_assistance>
3. The element could be used
as a link to embed a description of the link.
<terms_assistance href="http://www.bellanet.org/terms/en?1"
actuate="auto" show="embed" lang="en"
code="1">Grant</terms_assistance>
In the latter case, the actual
presence of a descriptive string (e.g. "Grant") is superfluous
since the element will be replaced with the result of the lookup of
the hypertext reference, at least if the hypertext reference returns
a valid value.
With elements with large numbers
of possible values, and frequently changing information (such as organizational
information), the overhead of traversing a link to get further information
is worth the processing involved. With coded values with small numbers
of values, such as Terms of Assistance and Status, the overhead of providing
a link to an authoritative source is too great; these elements have
been defined as empty elements, and style sheets or XML software will
have to render the attribute code into a human-readable form. Elements
with several hundred values, such as Country/Region or Sector (when
eventually standard sectors are defined for activity information) would
fall somewhere in between. In this third case, arguments could be made
both for or against providing a link to an authoritative source. In
this version of the DTD, both possibilities have been provided for,
with a code to allow the XML processor to provide the relevant information,
and attributes to support linking.
cofunding_org Co-funding
organization
This element has been drawn from
GK-AIMS, though its lack in the CEFDA format has been noted.
budget Budget Information
CEFDA was intended primarily
for use in text-based information retrieval systems, where certain fields
would be searchable, but display capabilities were simple. The ability
within XML to specify more specific data elements, as well as the importance
of budget information, requires greater detail or granularity in the
description of budget information. Two possible elements are found here:
a budget total, for organizations that wish to report only a single,
total amount of the activity budget; and budget line for organizations
that wish to provide more detailed information, broken down by budget
year, sub-activity, or currency.
Source documentation
DML has been designed for the
markup of metadata about development activities, including references
to documents that describe those metadata activities. Other related
metadata schemes or profiles currently in development and use include
Dublin Core (a fifteen element set designed for describing a wide range
of document-like objects), and the Government Information Locator Service
(GILS), an ISO 23950 profile for the description of government information
resources. Dublin Core in particular is of interest because of its possible
application in searching a wide range of information over the Internet.
GILS is of interest in that government agencies, such as bilateral development
agencies, may be encouraged to make information available in this format,
though actual implementations of GILS are still relatively rare. While
the semantics of both these formats are weak, because of the widespread
support for Dublin Core, the Dublin Core elements have been used for
the description of source documentation. However the way in which metadata
sets will be related is still in development with a number of different
proposals, and issues to be settled. (For cross-domain searching, for
example, DML might wish to present an Activity Title as a Dublin Core
Title; however in describing a documentary reference, the Dublin Core
Title is more appropriately and precisely used to describe the title
of the document describing the resource.) In view of these uncertainties,
this version of DML simply adopts Dublin Core semantics for bibliographic
descriptions. These DML elements can be converted to more explicit Dublin
Core elements in the future when the mechanisms for doing so are better
established.
|