The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: December 30, 2005
Genealogical Data and XML


This document provides references for some prominent initiatives which have proposed the use of XML for storing and processing genealogical information. A separate document on Markup Languages for Names and Addresses contains information on abstract models and markup models for person, family, name, and related concepts. See also prosopographical research: "an independent science of social history embracing genealogy, onomastics and demography."


GedML: [GEDCOM] Genealogical Data in XML

GedML provides "a way of encoding genealogical data sets in XML. It combines the well-established GEDCOM data model with the XML standard for encoding complex information. The result is a representation that can easily be converted to and from GEDCOM, but can be manipulated much more easily using standard tools: notably, by using an XSLT processing such as Saxon."

On 12-May-1999, the software was updated to work with SAXON 4.2.

Software provided by Kay (as of 2002-12-27) included four Java classes in source and compiled form:

  • GedcomParser: This class implements the SAX2 XMLReader interface, so it pretends to be an XML parser, but actually it is parsing GEDCOM files...
  • AnselInputStreamReader: GEDCOM files use a rather unusual character encoding which is not supported by most Java VMs; this class performs the conversion from ANSEL characters to Unicode...
  • GedcomOutputter: This is the reverse of GedcomParser; it acts as a SAX2 ContentHandler which serializes a SAX event stream in the form of a GEDCOM file...
  • AnselOutputStreamWriter: This is the reverse of AnselInputStreamReader: it converts Unicode characters to ANSEL, and is used to write the output file by the GedcomOutputter...

Kay also provides stylesheets:

  • GedcomToXml.xsl performs an identity transformation; if GedcomParser is used as the input parser, the effect is to convert from GEDCOM encoding to XML...
  • XmlToGedcom.xsl also performs an identity transformation, but this time it is configured to use GedcomOutputter to produce the output in GEDCOM format...
  • GedcomToHtml.xsl produces an HTML rendition of the GEDCOM file; use this as a starting point to display your GEDCOM files in whatever way you want..."

Principal URLs:

Earlier references of possible historical value, some URLs broken:

GEDCOM (Genealogical Data Communication)

GEDCOM (GEnealogical Data COMmunication) is designed "to provide a flexible, uniform format for exchanging computerized genealogical data... GEDCOM has evolved over 15 years... Although GEDCOM XML is different from traditional GEDCOM both in syntax and underlying logical structure, it is still considered as an evolution of GEDCOM... An important part of GEDCOM is its ability to link records according to family lineage and other data relationships. XML's standard linkage method, using the ID and IDREF attributes, is equivalent to traditional GEDCOM's linkage method and will be used in its place. In traditional GEDCOM, links are bi-directional. For example, a CHIL tag in the FAM record connects a family to a child, and a FAMC tag in the INDI record connects a child to a family. Also, HUSB and WIFE tags in the FAM record connect to INDI records, and in the opposite direction, FAMS tags in the INDI record handle both spouses' connection to a FAM record. To specify a link in both directions is, of course, redundant and unnecessary. Some programs produce traditional GEDCOM with links in one direction, some the other, and some give both. That makes processing GEDCOM from a variety of sources difficult, and where both directions are specified, they may be inconsistent. In GEDCOM XML, all links are unidirectional and can be specified in only one way... In the past, ANSEL has been specified as the preferred character set for GEDCOM; in GEDCOM XML, the UNICODE character set is used."

GEDCOM was developed by the Family and Church History Department of The Church of Jesus Christ of Latter-day Saints.


  • Genealogy
  • GEDCOM 6.0 DTD as of 2005-12-30. Said to be '30 December, 2002'. [source, 2005-12-30]
  • GEDCOM XML Specification, Release 6.0. Beta Version. 55 pages. Send comments to:
  • GEDCOM XML Specification Release 6.0. Snapshot December 28, 2002.
  • GEDCOM 6.0 Beta XML DTD, initially published only in the PDF document. Apparently no official plain-text version as of 2003-01. Note: The December 06, 2002 prepended note to GEDCOM developers asserts that the published "beta" version of the GEDCOM XML Specification, Release 6.0 "is a complete specification and is suitable for use..." Since the principal "use" of a DTD is in a machine, one would expect the providers to supply an official valid XML DTD in a file that can be sent to an XML processor; the PDF document does not satisfy this requirement. Here are some unofficial XML DTDs:
    • XML DTD. Draft version from Lee Brown. Updated version of DTD originally posted to the '' list. 2002-12-30 or later. "Comments describe the valid formats and values for certain elements and attributes that are described in the spec but not enforced by the DTD..." [source]
    • XML DTD. Draft version of the XML DTD for GEDCOM XML 6.0 Beta. Posted by Harry Erwin to the GenealogyXML List, 2002-12-27 ("I've posted a transcription of the GEDCOM 6.0's likely to have errors, so take a look and let me know...") [source]
    • XML DTD. Draft version of the XML DTD for GEDCOM XML 6.0 Beta. Posted by Nolan Voight: "I copied (and removed the page refs from) the DTD from the pdf to a text file; it works okay when declared as the doctype for an xml document in jEdit, autocompletions & all..." [source]
  • GEDCOM to DAML. "ged2daml is a program for converting selected GEDCOM information into DAML... Generated DAML can be viewed using DAML Viewer. ged2daml was initially developed for DAML Homework Assignment 3. Lessons learned and other information is available here. The DRC concurrently developed a very similar GEDCOM ontology..."
  • The GEDCOM Standard Release 5.5
  • GEDCOM FAQ document. From
  • GENDEX - "an enterprise devoted to advancing the progress of family history and genealogy research on the World Wide Web"
  • Gedcom to XML Conversion Form
  • "GEDCOM Explained." By Dick Eastman. May 15, 2002.
  • "XML Version of GEDCOM." By Dick Eastman. April 24, 2002.


Overview: "XGenML is a global consortium developing an XML-based framework for the reporting of genealogical information. Membership is intended to include all those interested in genealogical information, including genealogical information providers, genealogical associations, the LDS church, genealogical software developers, government agencies, and individuals interested in XML and genealogy. The initial goal of XGenML is to create an open XML specification for genealogical information, based on the work of others who have started to create genealogical models and XML standards, including the LDS church with GEDCOM XML, Gentech, GeniML and any others. This standard can be used in genealogical applications to easily create, exchange, provide and analyze this information. XGenML will maintain the standard and be available to be the guardian of a globally unique identification scheme for people and places..."



gdmxml is "an XML implementation of the Genealogical Data Model. Specifically, it is a RELAX NG Schema to validate XML documents with genealogical information according to the Genealogical Data Model put together by the Lexicon Group from GENTECH... The gdmxml schema is licensed under the Creative Commons Attribution-NoDerivs License. Supporting files are distributed under the Creative Commons Attribution License." [2005-12]

Hans Fugal [WWW] is working on gdmxml as "an XML implementation of the Genealogical Data Model. [He says:] Specifically, I am writing an XML schema (in RELAX NG) to validate XML documents with genealogical information according to the Genealogical Data Model put together by the Lexicon Group from GENTECH. I solicit the help of any and all interested parties. To get involved, please join the email list and browse the email list archives..." The CVS archive contains draft schemas (W3C XML schema and RELAX NG).


Genealogical Information Markup Language (GeniML)

"GeniML (pronounced 'jeenie em el') is an XML vocabulary for recording and exchanging genealogical data. The specification was developed by Jerry Fitzpatrick of Software Renovation Corporation."


GENTECH Genealogical Data Model

"The GENTECH Data Modeling Project is an extension of the work done by GENTECH members on the Lexicon Project, an attempt to define genealogical data for the purpose of facilitating data exchange among genealogists. After some work on the Lexicon, the group recognized that it is difficult to define genealogical data out of context because of the various ways people interpret common genealogical terms. The group decided that the effort would be better served by defining genealogical data in the context of a logical data model, which is a systems engineering methodology used to define data in an automated data processing system... the group is simply using this methodology to define genealogical data; the group is not designing software... We used data modeling as a means to define genealogical data and the relationships between that data in an effort to bring greater understanding to the genealogical community about data issues... As a practical matter, we expect this explicit definition of genealogical data to foster discussion of genealogical data and perhaps in the future to help the genealogical community better exchange data by understanding the limitations of various subsets of genealogical data that may be implemented in automated or manual systems. Appendix A: 'Principles of Logical Data Modeling' contains a discussion of data modeling concepts for readers who may not be familiar with the terminology used in entity relationship diagrams..."

GENTECH Data Modeling Project Sponsor organizations, listed in the order that they were able to join with GENTECH in the project: GENTECH (Charter sponsor), Federation of Genealogical Societies - FGS (Charter sponsor), New England Historic Genealogical Society (NEHGS), National Genealogical Society (NGS), American Society of Genealogists (ASG), The Association of Professional Genealogists (APG), The Board for Certification of Genealogists (BCG).


Genealogical Data Models in the Unified Modeling Language (GDMUML)

GDMUML is a project of Stanley Mitchell.

"GDMUML is a representation of the GENTECH Genealogical Data Model using the UML. It takes the GENTECH model as a starting point and aims to preserve the semantics of the original model. The GENTECH model is represented as an Entity-Relationship model... The UML can also be used to model logical database designs. However, GDMUML does not do that. Instead, it focuses on system object modeling. An example will help differentiate the perspectives. The research-objective entity in the GENTECH model specifies a primary and foreign key, indicating that it is represented as records in a database table, each with a matching record in a different (foreign) database table, the project table. By contrast, GDMUML defines two classes ResearchObjective and Project. It does not specify how these might be implemented and thus avoids introducing database terminology. However, it does preserve the relationship between the objects, by indicating that the classes are associated and that one Project may have one-or-more ResearchObjectives..."



"GenXML is a file format for exchange of data between genealogy programs. It is based on XML and defined by a XML schema. It is not intended to be used as an internal format of any genealogy programs, although it may be possible. It is an alternative to Gedcom 5.5. The idea of GenXML is that: (1) it shall be easy to read by most genealogical programs; (2) it shall be easy to write by most genealogical programs; (3) it shall be easy to manipulate by third party programs; (4) all kinds of information shall fit into one and only one place... GenXML is mainly inspired by the theoretical Gentech Data Model (see and Gedcom Future Directions, which is an unfinished replacement of Gedcom 5.5."


GRAMPS Project

"GRAMPS is a genealogy program for Linux and other UNIX-like systems. The GRAMPS name stands for Genealogical Research and Analysis Management Programming System. GRAMPS helps you track your family tree. It allows you to store, edit, and research genealogical data. GRAMPS attempts to provide all of the common capabilities of other genealogical programs, but, more importantly, to provide an additional capability of integration not common to these programs. This is the ability to input any bits and pieces of information directly into GRAMPS and rearrange/manipulate any/all data events in the entire data base (in any order or sequence) to assist the user in doing research, analysis and correlation with the potential of filling relationship gaps..."



FamilyML is a standard data format based on XML which allows the easy exchange of genealogical data. Also based on GedML and GEDCOM, but in a more human-readable format. The definition is divided up into three files. family-tree is the main structure and contains references to the remaining two documents. family contains information about each family unit. Finally, person contains the information about the individual person..."


General Resources and References: Mailing Lists, Articles, Papers, News

  • National Genealogical Society Conference 2003. May 28-31, 2003. David Lawrence Convention Center, Pittsburgh, PA, USA. On Wednesday, May 28, 2003: "An XML Implementation of the GENTECH Lexicon Genealogical Data Model," by Hans Fugal. A report on the gdmxml project (accomplishments, current status, and future directions).

  • GenealogyXML List for Genealogy XML Schema. Post to A YahooGroups mailing list: "The purpose of this group is to discuss the application of XML to genealogical projects. Potential topics include XML, specifications on how XML should be used for genealogical reporting, vocabularies like GedML, GeniML, and GEDCOM 6.0, conversion of GEDCOM to XML, genealogy visualization tools, advice on XML and XSLT, and so on. Although this group is not a genealogy standards committee, discussion of current/proposed standards and models are viable topics."

  • Summary of XML/Genealogy Projects [GDMXML et. al.]. By Hans Fugal. Posted 2002-08-15 to the GDMXML mailing list.

  • List of Genealogy Specifications. From Easytopicmaps. See also: Topic Maps For Genealogy and Source Centric Approach to Genealogical TopicMaps. Robert McKinnon, Lars Marius Garshol, and others.

  • XML for Genealogists. Website maintained by Michael McGinnis.

  • "Lexicon Data Model." Article on the GENTECH Genealogical Data Model. Originally from Eastman's Online Genealogy Newsletter, copyright (c) 1998 by Richard W. Eastman and Ancestry, Inc. "... The Data Model isn't simple but it does have some basic elements that relate to all genealogists, whether they are involved with software development or not. It is a schema that uses 'entities' to describe the genealogical process, including how all data is defined, recorded and related to all other data... Keep in mind that the Lexicon Data Model is a logical model and not a physical model. It is not a set of instructions for software developers; it is a description of the genealogical process written in terms that developers can understand... The heart of the Data Model is the ASSERTION. For example, the will of a fictitious John Smith, executed on 1 May 1850, may say '...and to my daughter Polly Adams, I give $100.' From this I could assert that: John had a daughter called Polly. She was alive on 1 May 1850. And that she married an Adams. From another source, my knowledge of genealogy, I could also assert that this Polly was the same person as his daughter Mary (Polly being a common nickname for Mary.) Note that these ASSERTIONs will have different levels of surety, and when combined with other ASSERTIONs that address Polly/Mary SMITH/ADAMS, will document an ancestor. ASSERTION is linked to many other entities: SOURCE, EVENT, GROUP, PERSONA, PLACE, CHARACTERISTIC, SURETY, RESEARCHER and another ASSERTION. One can see that software that implements the Data Model would be very powerful. With all this detail recorded (or at least recordable) and linked at the lowest level, we can audit (or backtrack) on all of the hundreds of decisions we make when we enter data into our software. We can also find the decisions of those from whom we import data. Using just the links I've listed above, there are nine relationships other data may have to an ASSERTION. With the important recursive power of ASSERTIONs (that is ASSERTIONs about ASSERTIONs) this becomes the fundamental tool for documenting the genealogical process and in strengthening genealogy software..." See GENTECH Genealogical Data Model above. [cache]

  • Lexicon Working Group (LWG). GENTECH and FGS (Federation of Genealogical Societies) created the Lexicon Working Group in 1995.

  • The GENTECH Genealogical Data Model

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: