GedML: Genealogical Data in XML

[Snapshot of document at 2002-12-27; please see the official/canonical URL and content if possible.]

These pages describe GedML, a way of encoding genealogical data sets in XML. It combines the well-established GEDCOM data model with the XML standard for encoding complex information. The result is a representation that can easily be converted to and from GEDCOM, but can be manipulated much more easily using standard tools.

Updated 12 September 1999: This is a major overhaul, effectively GedML Mark 2. The principal changes are:

I have withdrawn the software temporarily from the site because the build was wrong. I hope to reissue it after teh next version of SAXON is out, in the next week or so. (8 Oct 1999)



I want your feedback, both on the principles and on the current software.



12 May 1999

Software updated to work with SAXON 4.2

16 February 1999

I've updated the software to work with SAXON 4.0, and made a few updates to the accompanying text.

Now is a good opportunity to take stock. There's been a fair amount of feedback, and XML is now much better understood.

Many people have given encouragement for the ideas behind GedML, and there is widespread acceptance in principle that XML is a good way forward for genealogy, but there have been two frequently-expressed reservations:

  1. What is really wrong with GEDCOM is not the encoding, it's the data model. To achieve better data interchange, we need a better data model; the encoding could be improved, but it's not the real problem
  2. GEDCOM is too well entrenched: no product vendor is interested in being the first to support some alternative standard.

I accept both these points. Producing a better data model is a hard problem, which is why I haven't attempted it. Sticking with the GEDCOM data model helps with point (2): the software provided here can freely convert between the traditional GEDCOM and GedML encodings, so a product only needs to support either one of these natively.

The position I totally reject is that the idea of an interchange format is intrinsically flawed. Bob Velke has been pushing an alternative approach called GenBridge, which is proprietary technology to convert between a number of database formats used by popular American software packages. This can produce good results for a finite number of products, but it is a closed approach: it doesn't allow the shareware author to write new useful tools that can import their data from anywhere.

8 Jun 1998 Since I announced GedML in April 1998 there has been a fair trickle of feedback, which I have responded to privately. I've been working on the software and have some exciting developments in progress [which I still haven't got round to publishing - 16 Feb 1999]. For the present, though, I'm just releasing an improved spec and an improved version of the converters. Still nothing that does anything very useful, but that will come.

As far as the spec is concerned, most of the changes were already anticipated:

25 May 1998 Since I first wrote these pages, a paper has been published on Future Directions for GEDCOM. This paper suggests a considerable advance in the GEDCOM information model, but it does very little to improve GEDCOM at the encoding level. I have produced a comment on the proposals.

In some ways it is a disappointment that the paper makes no progress on the encoding level at all, concentrating entirely on improving the GEDCOM data model itself. In fact the encoding proposals are very messy indeed, displaying a poor understanding of character coding standards and other such issues. But I prefer to see this as an opportunity: since no work has been done in this area, the field is wide open. My response to the Future Directions paper is here. The software As well as converters to and from GedML, there is now a utility to generate CSV files for loading data into a relational database or spreadsheet.

I've improved the converters considerably:

Michael H. Kay
16 February 1999