DML - Development Markup Language

[This local archive copy is from the official and canonical URL, http://www.bellanet.org/xml/min1.cfm; please refer to the canonical source document if possible.]

Main Page

IDML-L
Discussion Group

Updated
February 11, 1999

DML Meeting: Minutes

Download Meeting Minutes

Wednesday, January 13, 1998

(Click here for Thursday January 14th)

XML Breakfast

8:30 - 9:30: An Introduction to XML

Ron Davies

The scheduled presenter was prevented from attend the meeting this morning. In his place, Ron Davies gave an impromptu introduction to XML.

Mr. Davies described the history of markup and the development of the Standard Generalised Markup Language (SGML), and the HyperText Markup Language (HTML) which was developed for the World Wide Web using SGML. A number of problems with HTML as it related to markup over the World Wide Web were mentioned, including the display-orientation of HTML markup; the way in which HTML tags were used by authors who ignored their meaning; and the lack of specificity of tagging especially when dealing with very specific kinds of data stored in databases available over the Web. XML was developed by the World Wide Web (W3C) as the solution to those problems, while maintaining the benefits of SGML.

Like SGML, XML is a kind of meta-language that allows different industries, sectors or communities to develop their own way of marking up the structure of their documents. This structure is represented in a Document Type Definition (DTD).

Documents that are created with markup tags respecting the general rules of XML (most importantly in relation to correct use of start and end tags and correct nesting of elements inside larger elements) are said to be well-formed. Many applications only need to know that the document is well-formed to be able to use the information. Valid XML documents, on the other hand, must be well-formed, but they must also correspond exactly to the rules for document structure set out in a DTD. Applications, such as validating parsers, can check the DTD that a document uses and validate the document by ensuring that each tag in the document is used as it has been defined in the DTD.

How can XML be applied? A number of different types of applications were described:

Display
As with HTML, XML documents can be displayed or printed to allow people to read them. In XML, there is no particular format for displaying marked up information in a document. Rather style sheets are used that indicate how each element within the document should be displayed. Both Cascading Style Sheets (CSS) and XML Style Sheets (XSL) provide this capability, though the XSL standard is still under development, and not all Web browsers support the use of style sheets.
Searching
Full-text search often results in too much information being retrieved. If a text is marked up in XML, then search engines can restrict a search to a particular element, or particular type of document, and provide better and more precise search results.
Data transfer
XML can be used in data transfer from one database or application to another, independently of the computers, operating systems or applications involved. Information in one database or one form can be converted to XML, transferred, and then converted from XML into the format that can be managed by another database.
Publishing
XML can be used for a variety of publishing outputs. Now organizations often produce one kind of file for printed publications, and another form for electronic publication over the World Wide Web. Because XML is a subset of SGML, it can be used easily by publishers and printers with SGML printing processes, but in the future, when Web browsers support XML, the exact same file can be used for publishing the information over the Internet.

Questions and discussion

Mr. Sher asked what other industries or communities had developed their own markup language. Mr. Davies mentioned markup languages for mathematics, chemical formulae and music as some examples.

Mr. Kenney asked what happens to HTML when XML documents appear on the Web. Mr. Bray answered that new releases of the principal browsers will support XML (IE and Netscape 5). However many large-scale information applications will continue to use HTML; XML will be used for entering or creating data, and HTML will be used for delivery of information to end users.

Ms. Dueck asked whether XML was better at handling databases. Mr. Davies hesitated to be specific about databases, since the distinction between databases and documents is blurred with XML. XML offers more specific description in terms of both kinds of data. Ms. Dueck asked whether XML was specifically better for relational databases. Mr. Bray replied that the difference between relational and other database systems didn't really matter.

Mr. Kanfi made the comment that the focus of XML was on the transfer of data, not on the display, and Mr. White commented that XML wasn't replacing a database, just providing better use and access to it.

Mr. Balson asked if a user on the Web could use XML to integrate data from different sources, creating his or her own data set. Mr. Davies replied yes, but that all the tools to do that are not currently available.

Session 1 Chair: Terry Gavin

10:00 - 10:15 General Introduction

David Balson

David Balson, Executive Director of Bellanet welcomed participants to the meeting, particularly noting their strong interest in DML as evidenced by their attendance at the meeting in spite of the cold temperatures (-20 to -30 degrees C.) and snow outside.

After a brief description of [Bellanet], Mr. Balson outlined the involvement of Bellanet over the past 14 to 15 months with the DML initiative. Just over a year ago, David Steinberg of CIDA first asked why Bellanet wasn't looking at an XML-based development activity standard, in order to foster better information exchange in the development community, specifically "process information" or information about programs, projects and evaluations.

Mr. Balson then described some of his personal experiences with information sharing initiatives. At IDRC, he was first responsible for the Development Database Service (DDBS), where he had to convert four IDRC and three external bibliographic databases into a format similar enough to allow users to search those databases effectively. Later he was involved in an effort to amalgamate information from twenty to thirty NGOs in six weeks following a major conference. As a member of the INDIX Steering Committee, he was involved in integrating data from a wide range of agencies on a CD-ROM. Through Bellanet's involvement with the Global Knowledge Initiative, he was concerned with gathering information on pipeline and current activities relating to Global Knowledge. In all these contexts, he wondered whether XML might have helped in information-sharing.

Early in 1998, Bellanet asked Ron Davies to contact fifteen to twenty development organizations to find out what, if anything, was happening with XML. This informal survey found that there was interest in the development community, but nobody was actively pursuing developments toward an XML-based Development Markup Language. Bellanet formed a mailing list to encourage discussion, drafted and distributed a proposal document outlining what might be done with XML, undertook the development of a draft DTD, and organized this meeting. The announcement of this meeting resulted in a doubling of the size of the DML mailing list, clear evidence of the interest in the development community. Mr. Balson stated that he had no idea what the outcome of this meeting would be, but that he hoped that whatever the outcome, it would lead to better ways to exchange development information.

10:15 - 11:30 Keynote Speech

Tim Bray

[Click here to view slides from Mr. Bray's presentation]

Mr. Bray began by introducing himself. A graduate of the University of Guelph, he was manager of the Oxford English Dictionary project at the University of Waterloo from 1986 to 1989 before co-founding OpenText. He left OpenText in 1996 to become an independent consultant, has worked extensively with XML since that time. He is co-editor of the XML 1.0 Specification and the XML Namespaces specification as well as Technical Editor of XML.com.

Mr. Bray outlined some of the problems with information in a networked environment, particularly in terms of performance (too slow); interchange between systems using different operating systems, databases, applications and human languages; retrieval (difficult to find information because of the reliance on full text searching); and openness (useful information outlives the technology, but carrying forward that information is difficult and expensive).

Mr. Bray then discussed the theory of markup, distinguishing between presentational, procedural and descriptive markup. XML is designed to provide non-proprietary, descriptive markup, where the content is described to facilitate re-use, search and retrieval. He described the history of XML, which started in 1996, with the realization that there were many applications which needed to extend HTML (for electronic commerce, voice processing and so on), and that it was very difficult to extend HTML for a number of both technical reasons, and political reasons.

Mr. Bray showed a number of differences between HTML and XML, including specification of the document type, exact matching of tags, the definition of entities, use of descriptive rather than presentational markup, required end tags for non-empty elements, and proper use of quotation marks. He emphasized that well-formed XML is easy to parse, and programs that do so are only 20 to 40 kilobytes in size as opposed to the 10 to 16 megabytes required for HTML parsers.

XML does not require a "Central Bureau of Tags"; users can invent whatever tags they need. The only requirement is to have a style sheet that will indicate how the information should be displayed, and the need to write software to do specific kinds of processing. Users can created Document Type Definitions (DTDs) to check the validity of their document markup so that (a) authoring software can help authors to create correct XML documents (b) various other application software will know what kinds of tags to expect and (c) users are forced to think about what they want to do. Nevertheless, for many kinds of processing, you don't need a DTD.

Mr. Bray presented key dates in the history of XML, and then a brief overview of XML features. The dream-- the reason why we were developing XML-- included a faster Web (where more processing was done on the client), a better way to share knowledge between different applications (by shipping data from one to another in a simple, standard format), a large number of metadata vocabularies, and a more open system (where information could be used, stored and found independent of proprietary binary formats). Standards do not make life easier for software vendors, but if users demand standard-compliant software, they will get it.

In summary, XML is a meta-language for describing markup, allowing you to invent your own tags. XML is simple, with a specification of less than 40 pages. It uses Unicode, facilitating creation and use of documents in multiple and non-Roman script languages. It has built-in error handling, is efficient over a network, and has support from all the major vendors.

Questions and discussion

Ms. Dueck asked how usable XML was with Microsoft applications, since in developing countries that was all that many users could afford. Mr. Bray replied that there were many free tools for manipulating XML documents, including Perl and simple text editors which are all that is required for authoring. There is a need for XML authoring tools selling for about $100 - $200 dollars.

Mr. Rose asked if XML contains obligatory features. Mr. Bray replied that it did, for reasons of greater interoperability.

Mr. Steinberg asked if there were any major pieces missing from XML. Mr. Bray replied that the popular browsers were not very good about being standards-conformant.

Mr. Kenney commented that XML seemed to require more effort on the part of information creators and asked if XML was going to compromise the democratizing effect of the World Wide Web. Mr. Bray replied that HTML was very forgiving of errors and that this appealed to creators of informal or personal sites. Because of its low overhead, it was suitable for human-to-human communication, and would still be used ten years from now. However for documents that require effort, that persist over time, and that may be used for purposes other than simple display, XML provided substantial benefits. If, at the current time, a little extra time were required in creating them, that extra time would probably decrease in the future.

Mr. White asked how XML would support regulatory requirements of government departments and agencies. Mr. Bray replied that there were problems with some regulatory requirements since the appearance of an XML document was not fixed. Some agencies had met this by maintaining two versions of a document with the source in XML and a PDF-derived version where page boundaries were fixed.

Mr. Steinberg asked when the next series of standards could be expected (e.g. XSL, the XML stylesheet language, and the next version of XML). Mr. Bray replied that he expected the link and stylesheet standards to appear in the summer of 1999, and the data schema standard later. The committee looking into the next generation of XML is very conservative, and his advice would be to sit tight and wait.

Mr. Steinberg asked whether XML would eliminate the use of PDF. Mr. Bray replied that there are a large number of pages in PDF format and that PDF is very suitable for older legacy documents, so that PDF would have a role for a long time.

Ms. Street asked whether XML would replace HTML. Mr. Bray replied that they would co-exist forever, because there were already approximately 1 billion HTML documents, many Version 3 Web browsers still in use, and that HTML was still perfectly suitable for the simple display of information in human-to-human communication.

Mr. Bessemer asked about document schemas. Mr. Bray replied that Document Type Definitions (DTDs) were old-fashioned and were missing features such as data typing, inheritance and modularity. What was needed was a new generation of DTD, and a W3C working group was currently considering a large number of different proposals for document schemas. However DTDs were still the only thing to use today.

Mr. White asked about the scripts that appeared in the Mr. Bray's diagram describing the exchange information from a development agency and a government agency. Mr. Bray described some of the scripting tools, such as Perl, Rexx, Python and Omnimark that provide straightforward ways to convert textual data from one format to another. All of these tools have or will have built-in tools for XML processing.

Mr. Song asked how one could convince large bureaucratic organizations to get behind an idea like XML that was still in an embryonic phase. Mr. Bray replied that one way would be to keep cost figures on retrospective conversion of word processing files the last time your organization converted from one word processor to another. Efficiencies could be realized if organizations currently have to produce both a stream of data for print publishing and another stream for electronic publishing. Document authoring on a large "industrial" scale with a standard markup language is demonstrably cheaper.

Mr. Sher asked if there was a word processor in which you could create XML documents. Mr. Bray replied that there were, but they were large and expensive or (in the case of freeware) required some effort to install and use. We need lightweight word processing tools. Fortunately, those products are arriving rapidly.

Mr. Faye asked what value XML brought to databases. Would XML replace the ISO 2709 format? Mr. Bray answered that there were benefits for database-to-database interchange, especially for non-tabular, non-relational data; it was easy to transcribe richer and more human-friendly data structures into XML. XML would be cheaper to implement than more complex exchange structures.

Ms. Dueck asked about the role of the Resource Description Framework (RDF). Mr. Bray replied that RDF is a scheme (syntax and data model) to interchange metadata. RDF uses XML syntax. Mr. Davies suggested that RDF could be viewed as a framework in which to incorporate different metadata systems.

Mr. Kanfi asked if you would use RDF to combine development and health science data in a health-related markup language. Could RDF allow DML to speak to Dublin Core? Mr. Bray answered that you could publish a subset of development documents in the area of health science, and have taxonomy of documents.

Mr. McKenzie asked what Microsoft was doing. Mr. Bray replied that Microsoft had been one of the leaders, but that was before Internet Explorer had the largest share of the browser market. Now their strategy is difficult to understand. They are not pushing for an authoring tool that would produce XML for obvious reasons.

Mr. Rose asked whether RDF could be used for the interchange of bibliographic information. Mr. Bray replied that one of the leaders in RDF development was OCLC, the world's largest bibliographic utility. For more information, you could go to the W3C home page (www.w3.org) and look under Metadata. There was a need for easy-to-understand information on RDF; there was one text by Mr. Bray on the XML.com site.

12:00 - 12:30 Mutual introductions

In this part of the session, the participants in the meeting briefly introduced themselves and described their interest in and/or experience with XML.

12:30 - 2:00 Lunch

Session 2 Chair: David Balson

2:00 - 3:00 "Standard-setting experiences: learning from the past"

A round table with Mary Campbell, Judith Dueck, Carole Joling, Terence Hill and Joe Gollner

IDRIS and INDIX - Mary Campbell

Ms. Campbell first talked about IDRIS, originally the acronym for the Inter-agency Development Research Information System. This system was started in 1983 by the "like-minded donor agencies" interested in development research: IDRC, IFS, SAREC, BOSTID and GATE. The specific purpose of IDRIS was to share information and produce publications: agencies wanted to know what other agencies were doing. At that time only IDRC had an "corporate memory" database, so IDRC acted as a focal point, collecting data on diskettes or printed worksheets using a methodology developed by a consultant. At first, each agency had to search the database at IDRC, but later the information was made available on systems at other IDRC member agencies, as well as outside organizations such as ILO and IDS. The system is no longer needed, in part because some of the organizations no longer exist and in part because agencies use their own Web site or contribute information directly to the INDIX DAI database.

The lessons learned from this activity were:

ensure that there is a community of interest
specific needs should be identified by the community
commitment from the top down is a particularly important element.
foster good collaboration (e.g. each agency pulled its own weight)
use existing standards to reduce costs (e.g. dates, names of organizations, countries and currencies)
the support of IDRC as a focal point for its knowledge and experience
know when to move on, i.e. when the system is no longer required.

IDRIS now is a program of Research Information Management Services (RIMS) at IDRC and contains only IDRC data.

Ms. Campbell went on to discuss INDIX, the International Network for Development Information Exchange. This initiative started with representatives of UNDP, IDRC, USAID, CIDA, JICA, OECD, UN/ACCIS and WHO. All of these organizations had automated systems, all used the Macrothesaurus, but they nevertheless had difficulties in sharing data. This group decided to find ways of overcoming these barriers. They surveyed other agencies and came up with a list of twenty-three different barriers to sharing information. They also developed what was to become CEFDA, the Common Exchange Format for Development Information Exchange. In 1991, a meeting in Paris was attended by about 80 people representing more than 50 agencies to discuss issues and consider CEFDA as a tool for facilitate bilateral agency-to-agency information exchange. At the meeting, participants raised the issue of the difficulty of negotiating and implementing bilateral exchanges, and instead proposed to contribute data to a centralized database that could be published on CD-ROM. This Development Activity Information (DAI) database has contributions from 35 agencies and is now available on the Web as well.

Lessons learned from this activity were:

donors and funding recipients want information but neither want to pay for it nor to contribute information (can we quantify the value of information?)
where to look for answers is important (who knows what?)
information sharing requires corporate and individual commitment
poor data needs good software (the quality of contributed data varied considerably)
too many objectives may hinder achieving any of them (the DAI database was a funding tool, a means of promoting CEFDA, and an information resource for the development community-- at least one objective too many)

INDIX has concluded that selling access to the DAI database is not feasible, and has chosen to concentrate on using DAI as a tool for promoting CEFDA and as an information resource. INDIX will to look for funding elsewhere.

HURIDOCS - Judith Dueck

HURIDOCS is an international informal network of organizations working on human rights documentation. Since 1982, HURIDOCS has:

developed bibliographic formats for human rights documentation (based on AACR2 and others standards)
developed a standard format for human rights events
developed micro-thesauri (i.e. controlled vocabularies to use in the bibliographic and events formats)
undertaken translations
conducted training; and
built networks

Currently HURIDOCS is working on a revision of EVSYS (their events system), examining how to put these databases on the Web, working on revising the micro-thesauri, making all tools available on the Web site, and ensuring consistency between all HURIDOCS tools as well as emerging standards such as Dublin Core. In the future, important activities will include deciding on a proposal for parts of Dublin Core HURIDOCS should accept, investigating XML as a tool for linking databases, and the formation of an extranet.

Issues that arise in standardization include:

be willing to use develop and train others initially
ensure organizational and personal commitment
communicate regularly
ensure a spirit of compromise
develop documentation and update it
use existing standards (very important!), such as ISO list of countries and ILO occupational category codes
use existing organizations or networks (also very important!), e.g. piggybacking meeting on meetings of another group or network

Development Database Service (DDBS) - Carole Joling

The Development Database Service (DDBS) at IDRC was created, originated, launched and managed by David Balson. DDBS was a pool of bibliographic databases, created in the 1980s that continued for 15 years. It included IDRC databases such as BIBLIOL (the Library catalogue), DEVSIS (a development information database) and SALUS (a rural health database), as well as databases from external organizations such as FAO, ILO, UNESCO, and UNIDO. Databases from AID and WHO were added later as were IDRC's Acronym and IDRIS databases. In the 1990's, major changes took place in terms of cutbacks in funding and in terms of technology, and in 1996 the DDBS service was stopped. Only IDRC's BIBLIOL database is still accessible through this service.

In terms of lessons learned, it is important as a manager to look at costs, to know what staff you need as well as the staff you have, to assess user satisfaction and to build sustainability of any initiative. You must think about:

relevance (to ensure what is offered is both needed and wanted; keep your business plan current!)
currency (since expectations have increased; now more and more people want everything immediately, updated to the very minute)
data quality (to ensure that the data is of the highest quality possible)
strong vision and great motivation (needed to overcome organizational barriers)
partnerships (which you should use so you don't work alone)
confidence (what is the half life of the service you want to establish?)
promotion (to continue to "sell" your service)

What has changed? The expectations of users and organizations change over time. New technologies mean changes in products and services. Database vendors must follow market trends and identify their niche. The service must fit into the parent organization's goals and objections, and that institutional environment may shift as corporate priorities change.

Finally, Ms. Joling posed some questions in relation to XML: How much do we want it? How much do we need it? How much effort is it worth, in terms of people or money? Is it worth it?

The Evaluation of the DDBS - Alison Ball

Ms. Ball offered a more specific view of DDBS, in relation to the evaluation of the service.

Each database in the DDBS had quite different structures and content (e.g. thesauri, currencies, languages and field names). Audiences for the databases were different and some databases were restricted so security was required. Users could not run one search across all eight databases. Training changed over time: originally it was intended for information intermediaries such as librarians, but later an expanded audience included the general public and small NGOs. In addition, the system was changed to become easier to use, but less powerful. Communication was an issue at a time when list servers were not available, and printed newsletters were not produced often enough to keep users well-informed. Access was offered in three languages, but this meant all documentation, training and search screens had to be maintained in three languages. The idea of charging for the service was considered but never acted upon because administrative costs would have been prohibitive, though detailed statistics were collected.

Now that information is available over the Web, the need for the DDBS service no longer exists. Internet resources such as IDRC's "To the World Page" links to related externally produced information resources, and other organizations have developed similar lists of online resources, such as CIDA's "Virtual Library" and the ELDIS page, hosted by IDS. We need to discuss how to collaborate on the development of these and similar systems.

Documents at the IMF - Terrence Hill

The IMF is studying ways to automate documentation distribution and compound electronic document production. The issue of using SGML first arose in 1991 when the print shop was being automated. Compound documents must be printed on paper, then scanned into TIFF format, and then printed from the TIFF file. Technologies currently being studied include MS/Word, PCDOCS software, and PDF, HTML and RTL as well as SGML. Long term considerations include document creation, interchange, storage, publishing, manipulation and archiving. While the MS/WORD/PCDOCS solution is currently favoured for deployment, near-term in the Fund, moving to an all SGML/XML authoring environment is still very much under consideration. From Mr. Hill, a budget officers's point of view, this would be possible with SGML. The Web and its standards will have a tremendous impact on this area, but institutional barriers still exist.

CALS - Joe Gollner

Mr. Gollner manages the CALS initiative for the Canadian Department of National Defense. He is the Canadian representative and chair of NATO committees for SGML, electronic technical documentation and electronic data interchange (EDI).

CALS stood originally for Computer Assisted Logistics Support, but now is taken to represent Continuous Acquisition and Lifecycle Support. CALS has had a bad reputation because it has been associated with huge expenditures and more than a few failures. The original idea in 1985-1986 was to have an integrated system to exchange data with a huge supplier base and integrate it with business processes (e.g. printing). Six billion dollars were expended in a six-year period. Canada built the world's first SGML database with workflow and scheduling capabilities in a twenty-million dollar system that was taken out of operation the year after it was finished. Mr. Gollner was drafted into the CALS in 1994, and after expenditures of 50 million over 5 years, there are now standards and applications that people actually use, that currently result in a saving of about 100 million dollars a year.

The main issues that stand out from this are:

Ambition
Tailor your ambition to something feasible. Rather than run the risk of a huge failure, it is better to roll out something simple that a large number of people can use.
Relevance
The real benefits of structured information go beyond the publishing process; most benefits and most of the savings realized come from things we are capable of doing now that we have restructured technical information using SGML.
Aggression
Appeal to people's enlightened self-interest. Show up with money and technology that already works, and then encourage users to take advantage of what you have to offer.
Resourcing
Our biggest impediment was finding enough people with the right expertise, people who are used to implementing complex systems.
Design concepts
Adaptation to change was fundamental to SGML design. We now have a library of tools, abstracted so that choice of a particular tool over another is irrelevant. Participants can use an application, and then throw it away to select another, as long as those applications respect established rules when they interchange data. Take "baby steps", scaling applications and scheduling investments so that you don't spend too much before there is some tangible, usable functionality that users will be able to benefit from.

Questions and Discussion

The discussion was opened to the floor, so that participants could share their own experiences, answer the questions set out in the meeting agenda, add new items for consideration or ask for clarification of points made.

Mr. Rose stated that INDIX was the only large scale, product-oriented service of its kind. He asked how XML would affect INDIX developments, particularly in the light of the Steering Committee meeting the day before the DML Initiative meeting.

Mr. White replied that he couldn't give a definitive answer, because DML had not yet been defined. The INDIX Steering Committee did talk about DML as another way of sharing CEFDA-format information on the Internet. It is clear that a standard like CEFDA has to change to reflect the changing needs of the development community. For example, USAID no longer has projects but a performance, results-based development environment which is not well captured in CEFDA, though for many organizations the CEFDA format is still perfectly viable way of sharing information. Other areas, such as evaluation information, mean that CEFDA needs to change or develop to meet other standards, even though that may still require publication of a CD-ROM.

Mr. Balson indicated that INDIX members were involved in the organization of the DML Initiative. If there is a continuing DML Initiative, INDIX must be a leading player. The Steering Committee discussed both a "doable" work plan and a "could do" workplan and there were DML-related items on both.

Ms. Campbell noted that the DML DTD which Ron Davies was going to discuss on Thursday was based on the CEFDA format.

Mr. Litz asked who had started or completed XML systems. What were the biggest surprises? Did it take more time than expected? Was it technically more complex than expected?

Mr. Bray replied that there were certain things to watch out for based on his experience with SGML:

People biting off more than they could chew
In cooperative efforts, each organization might insist on certain specific features, resulting in a complex format when only later one realized that what people really needed was a simple subset. There is a need to prioritize: real needs will reveal themselves over time.
Implement quickly
Cultural issues
Professional editorial staff, for example, may be used to WYSIWYG, and are reluctant to give up very fine control over document appearance.
Vendor lies
No vendor will say, "We can do these few functions well." They will promise everything, but there are no "seamless business solutions." Plan for system customization, integration and "glue."
Time
In a publishing environment, there is never time to spare for developments. People will have they have no time to look at productivity improvements, because they are "too busy." Negotiate free time on their behalf with senior management.

Mr. Gollner made some additional points.

Beware of "simplicity"
Implementing SGML has never been simple and the same applies to XML. The power is still there, and it's a double-edged sword. XML promises technical simplification, but it can involve connecting systems that have never been connected before or were never designed to be connected.
Projected costs savings may be elusive
In fact you may end up spending more on publications, because you are upgrading graphics, vector wire diagrams, etc. The advantage is that the extra money is being spent on visible documentation services and higher value information.
Free tools
There is a wide range of tools available for free for projects that are fundamentally data interchange projects.

Mr. Hill asked from where the development community gets its information. Ms. Dueck replied that as far as HURIDOCS was concerned, they decided that they would not collect information; rather the organizations that create information must manage their own information. HURIDOCS only creates the tools to help exchange information if organizations choose to do so.

Mr. Song asked about the process of converting information from a proprietary database within one organization to XML and then back into the proprietary database in another organization. What happens if half a dozen organizations do the same? Mr. Bray replied that there were two ways to create XML: the hard way was to author a document, and the easy way was to take structured, database information and wrap it in DML. Legacy document conversion is another, not very good option: in that case you have to classify documents as informal (e.g. e-mail) or formal, and selectively choose formal documents for tagging.

Ms. Lamoureux asked about the relationship of CEFDA and XML. Mr. Davies replied that CEFDA defines the meaning or semantics of information you want to share, and that XML provides mechanisms to share it more effectively. The two standards are complementary.

Mr. White referred to the current INDIX DAI database, where data is collected centrally, and consequently the database may not be current. If information is available through the Web, does DML provide the facility for quick search of multiple databases without having assembled the data in one place? Mr. Bray replied that it did not. Such a facility is in principle possible, but search engines and a distributed search protocol are required. XML allows much more in-depth searching, quite apart from metadata searching, by enriching the search possibilities within a document. Mr. Davies commented that a distinction is often made between "search" and "retrieval" and that XML was primarily an aid in retrieval, but did not replace the need to resolve problems of search syntax.

Mr. Balson asked about distributed search systems, presenting a scenario of inter-agency sharing. In reply, it was noted that you can merge information from different sources using XML but you still have to search the databases individually. You still need to build some kind of distributed search facility. Mr. Davies mentioned the United Nations system UNIONS software, which took a search and re-formatted it in the search formats required by a number of different UN information sites. However this system only used a full text search, not a fielded search. Mr. Bray noted that the library community has created distributed searching aggregators, mostly custom-built, based on the Z39.50 Search and Retrieval protocol. He suggested that because disk space is cheap, a more effective solution than a distributed search might be to download the entire contents of various databases, merge them on a local disk, and then search them locally.

Mr. Song asked if XQL, the XML-based Query Language would solve the distributed search problem. Mr. Bray replied that it was not yet clear. This was probably not a priority for the development community since development data was metadata-heavy. He mentioned that being able to perform an aggregate search relies on all sites being online at the time the search is done. If people download an entire database to their local site, however, that raises questions of ownership. Mr. Davies added that searching multiple sites with the same search strategy has not yet proved successful in the Z39.50 community, in part for the reason Mr. Bray mentioned, but also because of minor but critical differences in the way different sites had implemented their local databases.

Ms. Dueck raised the issue of controlled vocabularies within metadata tags, emphasizing the importance of metadata for precise searching. Mr. Bray agreed that controlled vocabulary remains extremely important. He mentioned that the Microsoft Index Server with proper use of metadata in HTML pages could provide a quick and quite effective search engine.

Mr. Gollner described a project where information holdings were distributed for security reasons, i.e., so that the originator could control access. In this project, only the metadata was aggregated, by having a crawler collect it from different sites; the crawler instructions can include information about the availability of the information in hard copy or whether it is top secret.

Mr. Kanfi asked if the implementation of XML and CEFDA would enhance what INDIX was doing, and if this would throw open the doors to new types of services. Mr. Bray replied that XML would help to implement new services that we have not even thought about yet, and that won't become requirements until some time in the future.

Mr. Rousseau indicated that we need to know what we want to share; we need to talk about the CEFDA requirements that will become the foundation for the next version of the DML DTD. Mr. Balson indicated that this would be dealt with on Thursday. Ms. Lamoureux indicated that the question of "scope" on the agenda for Thursday should allow for discussion of this item. Mr. White commented that the CEFDA standard was not dead, and that today we were still talking about a central database and a coordinating body. USAID, for example, had only recently moved to a program-oriented approach, and the organization still had a large historical collection of program-related information. He would consider converting that into DML format as an experiment.

Thursday January 14th

Main Page

IDML-L
Discussion Group

Updated
February 11, 1999

DML Meeting: Minutes

Download Meeting Minutes

Thursday, January 14, 1998

(Click here to view the list of participants)

Session 3 Chair: Steve Song

9:00 - 9:45 The Draft DTD for Activity Information

Ron Davies

[Click here to see slides from this presentation]

Mr. Davies presented the design philosophy behind the development of an XML Document Type Definition (DTD) for development activity information. The scope of the DTD was assumed to be limited to development activities, including projects, loans, and credits. The DTD was designed to draw on existing standards wherever possible, including the Common Exchange Format for Development Information Exchange (CEFDA) for the definition and description of project-related elements and the Dublin Core for the description of bibliographic type information. In the latter case, there were several different approaches for integrating another metadata set into the DTD, including copying elements, using the new XML namespace specification to define a specific namespace, the Resource Description Framework and architectural forms. The DTD was designed to keep the cost of producing information in DML low, so that few elements were mandatory (as per CEFDA), and element names kept simple and easy to remember.

However there was also an effort made to improve on areas where CEFDA had been found to be weak. Descriptive elements already identified by INDIX as lacking (such as sector code) were added to the DTD. Budget information was marked up in a much more detailed way, paving the way for potential use of this kind of information in calculations such as totals and subtotals. The DTD was designed to improve the consistency of organizational information and country/regional information by anticipating the creation of network-accessible authority files for this kind of information, and allowing for optional linking to these kind of resources if the reporting organization chose to do so. Features of hypertext linking in XML were used to embed authority file information into the current document, but only when the user requested the information to allow for different network situations. To improve the display of longer textual elements such as abstract, elements were allowed to contain simple kinds of markup for headings, paragraphs, emphasized sections, and unnumbered lists.

Development of the Document Type Definition raised a number of important issues that need to be addressed by users.

Program-related information
Is the DTD (and by extension, CEFDA) suitable for the description of program as opposed to project-related activities?
Links between development activities
Should the DTD allow for hypertext linking between an activity funded and described by one organization and the sub-activities funded through that project and described by executing agencies (e.g. activities funded by the European Union but divided into separate work packages for different countries)? Should it allow for linking between previous or subsequent phases of the same project?
Multiple activities in one document
The current DTD describes only one activity, which is clearly not enough, but how should multiple activity documents be presented? Experience with different applications will provide some guidance.
Activity identifiers
Unique activity identifiers would be required for linking purposes, but what form should those activity identifiers take?

There had been comments during the meeting on keeping the functionality simple in the early versions of the DTD. Perhaps some of the features in the draft DTD, such as the authority file linking were too complicated, and should be removed in order to keep the functionality simple.

In short, the DTD is only a starting point, not an end point; it incorporates the result of many choices that need to be confirmed by the development community; and it will need revision and extension as experience is gained and new tools developed.

Questions and discussion

Ms. Schieber and Mr. Rose asked about the authority lists. Mr. Davies replied that there were a number of different possible scenarios. Organizational information, such as an organization name, could be provided if that was all the agency that published the information could or wanted to provide. However agencies could provide a link to a central authority (such as a URL) that would then be the source for authoritative information on the organization referred to in the activity description.

Mr. White asked if it was possible to have a decentralized model for the authority files. Mr. Davies replied that it was, and that there could be a number of mirrored sites for the central authority or one could even imagine that each organization provided the authoritative information on itself. One of the problems with a very decentralized model, however, was the risk that a domain name would change or a site would not be available, and that organizations would be pointing to an authority file that didn't exist. Maintaining a centralized or a small number of sites meant that you could reduce that risk.

Mr. Litz asked how this link would work if the source data came from a database rather than a document. Mr. Davies replied that it made no difference whatsoever.

Mr. Hill asked about the separation of structural and presentational markup in terms of the mixed data content for long text fields. Mr. Davies replied that there was one element that needed re-naming but otherwise it was all structural or descriptive markup.

Mr. Rose asked how development activities were linked within one organization, or among many. How are these links differentiated? How can we link projects by thread? Mr. Davies replied that the links anticipated in CEFDA were between funding source and executing agencies. The links between projects on the same theme was really accomplished by assigning them the same subject descriptor. The links between a series of different phases of a project would need to be looked at, and built into CEFDA, where the rules for this kind of description could be established.

Ms. Schieber commented on the cooperation required for linking-type information. The INDIX CEFDA description of development activity should become broader and more encompassing.

Mr. Ackles asked if we were looking at one complex DTD or should we be looking at several DTDs to suit user needs. Mr. Davies drew the parallel with the question of whether you store all your information in one database, or whether you have separate databases for different types of information (project information, evaluation information etc.). It depends on what you want to do with the information.

9:45 - 10:30 Pilot Projects

Michael Roberts, Hugo Besemer, Kevin McCann

[Click here to view the pilot projects. Note that the XSL demo requires Internet Explorer Version 5 beta 2].

Mr. Roberts indicated that the demos used the Internet Explorer Version 5 Beta 2 browser that had support for XML and XSL built into the browser. He showed a raw marked up page, and then a version of the same information rendered in the browser using the formatting information found in an XSL style sheet. He showed a number of different style sheets that displayed different subsets of the data in the XML file and displayed the data in different ways. He explained that even calculations (such as totaling budgets from a number of different projects) could be performed using the XSL stylesheet, where the processing was now being done (as Mr. Bray had described it) on the client side.

Mr. Besemer then demonstrated a Micro-CDS/ISIS version of the CARIS database that he had made available over the Web, and had adapted to provide information in the format specified by the DML Version 0.01 DTD. This was done in part to discover how difficult it was to adapt an existing database to the DML format, i.e. whether essential elements were missing, whether DML had elements not in CARIS, and whether CARIS had elements not in DML.

His conclusions included the fact that some caution should be exercised in downloading the IE 5 beta 2 browser as problems could result. He also explained that there was a greater need with XML for data to be consistent, as XML was less forgiving than HTML, and a missing or incorrect element could result in no displayed information. Some reformatting of the CARIS data was required, and though reformatting was not difficult, the results were not perfect. For example:

street name was in some cases transferred as part of the <city> element
<end_date> was a mandatory element in CEFDA (and the DML DTD) but not in CARIS
the Local Descriptor element (<local_desc>) seemed inappropriate for a thesaurus as well-established and well-known as the FAO's Agrovoc
CARIS had no information on the funding of research activities, either in terms of budget or funding organization
the DML DTD had no element corresponding to person associated with an activity, i.e. researchers, which were an important information element CARIS

In short, providing XML on display was not difficult, but the really important work came in importing the data into an XML-compatible format.

Mr. McCann and Mr. Roberts then demonstrated how information in XML format could be imported into another database using XML to support this data interchange. Software developed by Mr. McCann used three different phases of data acquisitions. First, the XML format data was grabbed from an external source (document or database). Second, the records were validated by the software developed by Mr. McCann (since this capability was not found in the ColdFusion tool used in this demo software), and various kinds of errors highlighted for the end user. Third, if the information was valid, the user could choose to add the information to (in this case) an Access database through a ColdFusion script.

Initially importing data from a single file containing a DML activity description was demonstrated. Then, several searches of the CARIS database Mr. Besemer had demonstrated earlier was performed and the retrieved records were loaded into the local Access database. Once loaded into the local database, all these records could (for example) be indexed, searched or otherwise manipulated. Mr. McCann indicated that the software in question took about two days to develop.

Questions and Discussion

Mr. Rose asked if you needed XSL in order to use XML data. Mr. Roberts said no, you could take your marked up XML data, import it into a database, and then display it using HTML.

Mr. Steinberg asked if you could do currency conversions in the stylesheet. Mr. Roberts indicated that not all capabilities were found with IE Version 5 beta 2, but the W3C had a draft document on XSL that indicated full XSL capabilities, and that you could do a variety of database, spreadsheet and Boolean functions.

Mr. Temm asked about the use of WDDX, specifically to see if the meeting participants were interested in sharing tools used to produce or load XML records based on different proprietary technologies such as ColdFusion.

Mr. Kenney asked how difficult it was to use XSL. Mr. Roberts replied that it was difficult, because as yet very little documentation existed. Mr. Kenney asked if stylesheets were likely to represent the evolution of the Web from HTML. Mr. Roberts replied that this was not yet so, but it would likely become so in five or six years. He noted that a W3C committee was defining a DTD for HTML.

Mr. Faye asked if XML records could be added to a database only one-by-one. Mr. McCann replied that groups of records could be imported; it was only that his demonstration software operated on one record at a time.

Mr. McKenzie asked if the ColdFusion scripting language was complex. Mr. McCann replied that it was quite easy to use, that it looked like HTML, and was in fact easier to use than Perl. Mr. Temm asked what version of ColdFusion was used. Mr. Roberts indicated it was Version 4.

Mr. White asked whether it was necessary to output a file of information, seeing that agencies like AID had over 10,000 records. Mr. Roberts replied that you could load single or small numbers of records, one very large file of a large number of records, or could retrieve specific records from a database for loading.

Ms. Ball asked what would change if in the offline, batch transfers of CEFDA information to the INDIX DAI database, the current tag-data format were replaced with the XML-based DML. Mr. Davies replied that in fact, there would be very little change in this specific type of transfer, and that the benefits of XML would be found in other uses and re-uses of the XML technology.

Session 4 Chair: Linda Schieber

11:00 - 12:30 Future directions for the DML Initiative: Brainstorming

The rest of the morning was spent on brainstorming ideas about the future of the DML Initiative. The objectives of the session were to:

Explore the level of interest in sharing information and in establishing a Development Markup Language
Map out a workplan for the next 6-12 months
Document current and anticipated commitments to participation by each organization or individual

Questions for discussion in the agenda were:

What have we learned from previous experiences?
Is there a future for DML?
What is involved in such an undertaking: buy-in from organizations, commitment from individuals, resources (human and financial), methodologies, marketing, etc.
What would be the scope of DML?
How can it be brought forward: an Advisory Committee, a Working Group, both?
Who are the key people and organizations for this endeavor?
What has to happen in order to increase participation?
What should go into the workplan: other meetings and workshops (such as the European meeting), a more sophisticated pilot involving more organizations, more Document Type Definitions, etc.?

A total of 59 different suggestions were made during the brainstorming. During the break for lunch, these suggestions were grouped into five broad categories and three sub-categories by a team of five volunteers: Alison Ball, Lucie Lamoureux, David Steinberg, Terry Gavin, and Linda Schieber. In the afternoon discussion, further clarification, discussion and revision of these points were made by the meeting participants. The revised list of categories is as follows:

1. Organizational

scope (formulation of problem/mission statement):
- bibliographic records, documents and databases; development activities; people or organizational directories; lessons learned; evaluations; country strategies; NGO case stories; private sector involvement; analytical documents/background documents
- focus activities on value added
- definition of audience (tools for those audiences); e.g., developing country participation
- involvement with pilot projects
- analytic documents, i.e. sector/country analysis
- commitment to collaboration
- KISS (Keep It Simple, Stupid) and manage expectations
other
- definition of DML
- the name of this group
- definition of organizational structure (steering group, working groups...)
- assess risks
- who owns DML?
- link to existing body to continue initiative; e.g., INDIX, Bellanet
- best points to gain buy-in and who needs to be convinced (of what, by whom...)
- funding
- time line--plans, technologies, and implementation aspects
- cost-benefit analysis and other benefits

2. Standards-setting

field definitions
authority lists / thesauri
pilot projects (e.g., DAI)
link to full text documents
agreement on data format
addition of new types of development activity information

3. Communication

internal
- listservs
- DML web site
external
- listservs
- DML web site (FAQ)
- outreach and marketing
- best points to gain buy-in and who needs to be convinced (of what, by whom...)
- think of fundamentals of information sharing within the development community
liaison / networking
- aware of other XML initiatives, other related initiatives and other standards related initiatives
- investigate joining with OASIS, CUSHRID, HURIDOCS
- W3C

4. Technical

pilot projects (e.g., DAI)
tools for target groups
identification of data manipulation tools
search tools
display tools
timeline for technical developments

5. Training

including consulting

6. Monitoring and evaluation of the Activity

12:30 - 2:00 Lunch

2:00 - 5:00 Future Direction for the DML Initiative:

Concrete Decisions / Next Steps

In the afternoon session, a number of issues relating to these points were discussed generally.

Mission statement

One of the first things must be the formulation of a mission statement that everyone can agree on. We should establish general criteria or principles for the existence of a group, e.g. commitment to open standards, to sharing information. One of the principles should be that the group is open to the entire development community. Even the private sector could be included.

FAQ (Frequently Asked Questions)

The suggestion was made that the group should develop an FAQ. Others felt that while we have a variety of useful documents on XML to share, it might be premature to have an FAQ. One participant offered to draft an FAQ.

European Meeting (Maastricht, May 1999)

A European meeting to discuss DML is scheduled for Maastricht in May. EADI is interested in exploring information delivery. The Europeans may join this group to help move the agenda forward. Other standard-setting groups have shown that such groups can be very open and decentralized. We need to create greater awareness of XML in Europe. Bellanet has agreed to participate in the Maastricht meeting.

Organizational commitments

The DML Initiative falls within the Bellanet mission of sharing information to promote collaboration. Bellanet is not interested in a continued leadership or coordination role. However Bellanet will "keep the ball rolling" by activities such as maintaining the Web site, managing the list servers, participating in pilot projects and contributing small amounts of money and staff time. However Bellanet's mission is not standard-setting but collaboration.

INDIX wants to participate and has a role, especially in the content side, but does not want a leadership role in this kind of activity. Updating CEFDA will be a priority for INDIX, but we know that it takes time to solicit input from partners. INDIX could work on a standard for exchanging evaluation information in cooperation with the DAC, and could do a small pilot project with the DAI database, assuming funds were available.

Pilot projects

Some of the questions raised during the meeting will be resolved with time as pilots get underway. The suggestion of Mr. White to use the DAI database for a pilot should be taken up. Others should start doing what they can with the information under their control. The DAC Evaluation database would be another good candidate database. Bellanet could be involved in active pilots using the GK-AIMS database.

A list could be drawn up of people who want to start experimenting with information sharing, especially where the people are using different software/hardware combinations. It's certainly feasible to start using the markup, but organizations shouldn't invest too heavily in the details of the draft DTD, which we know will change. We know it will take some months to experiment with data and work out organizational issues.

A variety of tests will yield rich results, but it would be useful to have a framework for experiments, to establish the purpose, intended benefits and the lessons learned. Several participants volunteered to draft a framework and discuss it on the listserv. The tests at this stage would be really proofs of concept; we should formally publish revisions to the DTD, but streamline the process so that a DTD can be rapidly implemented. Several participants agreed to work on a framework for the pilot projects.

Communication

Do participants become advocates for XML within the development community? It was felt that we have an obligation to help others in the development community. Others expressed the feeling that this type of information was not suitable for wide broadcast within the community. Rather Bellanet should disseminate DML information through a newsletter to MIS and IT departments. Information about the meeting would be included in the next INDIX newsletter.

We should use the Bellanet Website for continued interaction and learning, particularly in working towards the Maastricht meeting. Some consideration should be given to getting a separate domain name for the DML initiative to give it a separate identity. Bellanet could also create separate lists for different working groups as required.

There was some discussion concerning the name for the group, and whether "XML for development" might be preferable to "Development Markup Language". Europeans had also expressed concerns about the implications that there was a special language for international development, that development is not conforming to general standards. In addition, DML implies that there is only one DTD, when there might be several.

Training

Is training really a separate issue, or is it part of liaison and networking? Training of senior management can be considered awareness building and marketing activity. However even if it is a separate category, training may not be an immediate priority, since we have to have more concrete DML tools and processes before we can begin to train people in these tools and techniques.

Before the close of the meeting, each participant spoke in turn about what he or she would personally do to follow up on the meeting. A list for different listservs and working groups was posted where participants could sign up.

Participant List