CP RSS Channel
About Our Sponsors
Articles & Papers
Technology and Society
|Genealogical Data and XML|
This document provides references for some prominent initiatives which have proposed the use of XML for storing and processing genealogical information. A separate document on Markup Languages for Names and Addresses contains information on abstract models and markup models for person, family, name, and related concepts. See also prosopographical research: "an independent science of social history embracing genealogy, onomastics and demography."
GedML provides "a way of encoding genealogical data sets in XML. It combines the well-established GEDCOM data model with the XML standard for encoding complex information. The result is a representation that can easily be converted to and from GEDCOM, but can be manipulated much more easily using standard tools: notably, by using an XSLT processing such as Saxon."
On 12-May-1999, the software was updated to work with SAXON 4.2.
Software provided by Kay (as of 2002-12-27) included four Java classes in source and compiled form:
- GedcomParser: This class implements the SAX2 XMLReader interface, so it pretends to be an XML parser, but actually it is parsing GEDCOM files...
- AnselInputStreamReader: GEDCOM files use a rather unusual character encoding which is not supported by most Java VMs; this class performs the conversion from ANSEL characters to Unicode...
- GedcomOutputter: This is the reverse of GedcomParser; it acts as a SAX2 ContentHandler which serializes a SAX event stream in the form of a GEDCOM file...
- AnselOutputStreamWriter: This is the reverse of AnselInputStreamReader: it converts Unicode characters to ANSEL, and is used to write the output file by the GedcomOutputter...
Kay also provides stylesheets:
- GedcomToXml.xsl performs an identity transformation; if GedcomParser is used as the input parser, the effect is to convert from GEDCOM encoding to XML...
- XmlToGedcom.xsl also performs an identity transformation, but this time it is configured to use GedcomOutputter to produce the output in GEDCOM format...
- GedcomToHtml.xsl produces an HTML rendition of the GEDCOM file; use this as a starting point to display your GEDCOM files in whatever way you want..."
Earlier references of possible historical value, some URLs broken:
GEDCOM (GEnealogical Data COMmunication) is designed "to provide a flexible, uniform format for exchanging computerized genealogical data... GEDCOM has evolved over 15 years... Although GEDCOM XML is different from traditional GEDCOM both in syntax and underlying logical structure, it is still considered as an evolution of GEDCOM... An important part of GEDCOM is its ability to link records according to family lineage and other data relationships. XML's standard linkage method, using the ID and IDREF attributes, is equivalent to traditional GEDCOM's linkage method and will be used in its place. In traditional GEDCOM, links are bi-directional. For example, a CHIL tag in the FAM record connects a family to a child, and a FAMC tag in the INDI record connects a child to a family. Also, HUSB and WIFE tags in the FAM record connect to INDI records, and in the opposite direction, FAMS tags in the INDI record handle both spouses' connection to a FAM record. To specify a link in both directions is, of course, redundant and unnecessary. Some programs produce traditional GEDCOM with links in one direction, some the other, and some give both. That makes processing GEDCOM from a variety of sources difficult, and where both directions are specified, they may be inconsistent. In GEDCOM XML, all links are unidirectional and can be specified in only one way... In the past, ANSEL has been specified as the preferred character set for GEDCOM; in GEDCOM XML, the UNICODE character set is used."
GEDCOM was developed by the Family and Church History Department of The Church of Jesus Christ of Latter-day Saints.
- GEDCOM 6.0 DTD as of 2005-12-30. Said to be '30 December, 2002'. [source, 2005-12-30]
- GEDCOM XML Specification, Release 6.0. Beta Version. 55 pages. Send comments to: email@example.com.
- GEDCOM XML Specification Release 6.0. Snapshot December 28, 2002.
- GEDCOM 6.0 Beta XML DTD, initially published only in the PDF document. Apparently no official plain-text version as of 2003-01. Note: The December 06, 2002 prepended note to GEDCOM developers asserts that the published "beta" version of the GEDCOM XML Specification, Release 6.0 "is a complete specification and is suitable for use..." Since the principal "use" of a DTD is in a machine, one would expect the providers to supply an official valid XML DTD in a file that can be sent to an XML processor; the PDF document does not satisfy this requirement. Here are some unofficial XML DTDs:
- XML DTD. Draft version from Lee Brown. Updated version of DTD originally posted to the 'GenealogyXML@yahoogroups.com' list. 2002-12-30 or later. "Comments describe the valid formats and values for certain elements and attributes that are described in the spec but not enforced by the DTD..." [source]
- XML DTD. Draft version of the XML DTD for GEDCOM XML 6.0 Beta. Posted by Harry Erwin to the GenealogyXML List, 2002-12-27 ("I've posted a transcription of the GEDCOM 6.0 DTD...it's likely to have errors, so take a look and let me know...") [source]
- XML DTD. Draft version of the XML DTD for GEDCOM XML 6.0 Beta. Posted by Nolan Voight: "I copied (and removed the page refs from) the DTD from the pdf to a text file; it works okay when declared as the doctype for an xml document in jEdit, autocompletions & all..." [source]
- GEDCOM to DAML. "ged2daml is a program for converting selected GEDCOM information into DAML... Generated DAML can be viewed using DAML Viewer. ged2daml was initially developed for DAML Homework Assignment 3. Lessons learned and other information is available here. The DRC concurrently developed a very similar GEDCOM ontology..."
- The GEDCOM Standard Release 5.5
- GEDCOM FAQ document. From Familysearch.org
- GENDEX - "an enterprise devoted to advancing the progress of family history and genealogy research on the World Wide Web"
- Gedcom to XML Conversion Form
- "GEDCOM Explained." By Dick Eastman. May 15, 2002.
- "XML Version of GEDCOM." By Dick Eastman. April 24, 2002.
Overview: "XGenML is a global consortium developing an XML-based framework for the reporting of genealogical information. Membership is intended to include all those interested in genealogical information, including genealogical information providers, genealogical associations, the LDS church, genealogical software developers, government agencies, and individuals interested in XML and genealogy. The initial goal of XGenML is to create an open XML specification for genealogical information, based on the work of others who have started to create genealogical models and XML standards, including the LDS church with GEDCOM XML, Gentech, GeniML and any others. This standard can be used in genealogical applications to easily create, exchange, provide and analyze this information. XGenML will maintain the standard and be available to be the guardian of a globally unique identification scheme for people and places..."
gdmxml is "an XML implementation of the Genealogical Data Model. Specifically, it is a RELAX NG Schema to validate XML documents with genealogical information according to the Genealogical Data Model put together by the Lexicon Group from GENTECH... The gdmxml schema is licensed under the Creative Commons Attribution-NoDerivs License. Supporting files are distributed under the Creative Commons Attribution License." [2005-12]
Hans Fugal [WWW] is working on gdmxml as "an XML implementation of the Genealogical Data Model. [He says:] Specifically, I am writing an XML schema (in RELAX NG) to validate XML documents with genealogical information according to the Genealogical Data Model put together by the Lexicon Group from GENTECH. I solicit the help of any and all interested parties. To get involved, please join the email list and browse the email list archives..." The CVS archive contains draft schemas (W3C XML schema and RELAX NG).
"GeniML (pronounced 'jeenie em el') is an XML vocabulary for recording and exchanging genealogical data. The specification was developed by Jerry Fitzpatrick of Software Renovation Corporation."
"The GENTECH Data Modeling Project is an extension of the work done by GENTECH members on the Lexicon Project, an attempt to define genealogical data for the purpose of facilitating data exchange among genealogists. After some work on the Lexicon, the group recognized that it is difficult to define genealogical data out of context because of the various ways people interpret common genealogical terms. The group decided that the effort would be better served by defining genealogical data in the context of a logical data model, which is a systems engineering methodology used to define data in an automated data processing system... the group is simply using this methodology to define genealogical data; the group is not designing software... We used data modeling as a means to define genealogical data and the relationships between that data in an effort to bring greater understanding to the genealogical community about data issues... As a practical matter, we expect this explicit definition of genealogical data to foster discussion of genealogical data and perhaps in the future to help the genealogical community better exchange data by understanding the limitations of various subsets of genealogical data that may be implemented in automated or manual systems. Appendix A: 'Principles of Logical Data Modeling' contains a discussion of data modeling concepts for readers who may not be familiar with the terminology used in entity relationship diagrams..."
GENTECH Data Modeling Project Sponsor organizations, listed in the order that they were able to join with GENTECH in the project: GENTECH (Charter sponsor), Federation of Genealogical Societies - FGS (Charter sponsor), New England Historic Genealogical Society (NEHGS), National Genealogical Society (NGS), American Society of Genealogists (ASG), The Association of Professional Genealogists (APG), The Board for Certification of Genealogists (BCG).
GDMUML is a project of Stanley Mitchell.
"GDMUML is a representation of the GENTECH Genealogical Data Model using the UML. It takes the GENTECH model as a starting point and aims to preserve the semantics of the original model. The GENTECH model is represented as an Entity-Relationship model... The UML can also be used to model logical database designs. However, GDMUML does not do that. Instead, it focuses on system object modeling. An example will help differentiate the perspectives. The research-objective entity in the GENTECH model specifies a primary and foreign key, indicating that it is represented as records in a database table, each with a matching record in a different (foreign) database table, the project table. By contrast, GDMUML defines two classes ResearchObjective and Project. It does not specify how these might be implemented and thus avoids introducing database terminology. However, it does preserve the relationship between the objects, by indicating that the classes are associated and that one Project may have one-or-more ResearchObjectives..."
"GenXML is a file format for exchange of data between genealogy programs. It is based on XML and defined by a XML schema. It is not intended to be used as an internal format of any genealogy programs, although it may be possible. It is an alternative to Gedcom 5.5. The idea of GenXML is that: (1) it shall be easy to read by most genealogical programs; (2) it shall be easy to write by most genealogical programs; (3) it shall be easy to manipulate by third party programs; (4) all kinds of information shall fit into one and only one place... GenXML is mainly inspired by the theoretical Gentech Data Model (see www.gentech.org) and Gedcom Future Directions, which is an unfinished replacement of Gedcom 5.5."
"GRAMPS is a genealogy program for Linux and other UNIX-like systems. The GRAMPS name stands for Genealogical Research and Analysis Management Programming System. GRAMPS helps you track your family tree. It allows you to store, edit, and research genealogical data. GRAMPS attempts to provide all of the common capabilities of other genealogical programs, but, more importantly, to provide an additional capability of integration not common to these programs. This is the ability to input any bits and pieces of information directly into GRAMPS and rearrange/manipulate any/all data events in the entire data base (in any order or sequence) to assist the user in doing research, analysis and correlation with the potential of filling relationship gaps..."
FamilyML is a standard data format based on XML which allows the easy exchange of genealogical data. Also based on GedML and GEDCOM, but in a more human-readable format. The definition is divided up into three files. family-tree is the main structure and contains references to the remaining two documents. family contains information about each family unit. Finally, person contains the information about the individual person..."
National Genealogical Society Conference 2003. May 28-31, 2003. David Lawrence Convention Center, Pittsburgh, PA, USA. On Wednesday, May 28, 2003: "An XML Implementation of the GENTECH Lexicon Genealogical Data Model," by Hans Fugal. A report on the gdmxml project (accomplishments, current status, and future directions).
GenealogyXML List for Genealogy XML Schema. Post to GenealogyXML@yahoogroups.com. A YahooGroups mailing list: "The purpose of this group is to discuss the application of XML to genealogical projects. Potential topics include XML, specifications on how XML should be used for genealogical reporting, vocabularies like GedML, GeniML, and GEDCOM 6.0, conversion of GEDCOM to XML, genealogy visualization tools, advice on XML and XSLT, and so on. Although this group is not a genealogy standards committee, discussion of current/proposed standards and models are viable topics."
Summary of XML/Genealogy Projects [GDMXML et. al.]. By Hans Fugal. Posted 2002-08-15 to the GDMXML mailing list.
List of Genealogy Specifications. From Easytopicmaps. See also: Topic Maps For Genealogy and Source Centric Approach to Genealogical TopicMaps. Robert McKinnon, Lars Marius Garshol, and others.
XML for Genealogists.
Website maintained by Michael McGinnis.
"Lexicon Data Model." Article on the GENTECH Genealogical Data Model. Originally from Eastman's Online Genealogy Newsletter, copyright (c) 1998 by Richard W. Eastman and Ancestry, Inc. "... The Data Model isn't simple but it does have some basic elements that relate to all genealogists, whether they are involved with software development or not. It is a schema that uses 'entities' to describe the genealogical process, including how all data is defined, recorded and related to all other data... Keep in mind that the Lexicon Data Model is a logical model and not a physical model. It is not a set of instructions for software developers; it is a description of the genealogical process written in terms that developers can understand... The heart of the Data Model is the ASSERTION. For example, the will of a fictitious John Smith, executed on 1 May 1850, may say '...and to my daughter Polly Adams, I give $100.' From this I could assert that: John had a daughter called Polly. She was alive on 1 May 1850. And that she married an Adams. From another source, my knowledge of genealogy, I could also assert that this Polly was the same person as his daughter Mary (Polly being a common nickname for Mary.) Note that these ASSERTIONs will have different levels of surety, and when combined with other ASSERTIONs that address Polly/Mary SMITH/ADAMS, will document an ancestor. ASSERTION is linked to many other entities: SOURCE, EVENT, GROUP, PERSONA, PLACE, CHARACTERISTIC, SURETY, RESEARCHER and another ASSERTION. One can see that software that implements the Data Model would be very powerful. With all this detail recorded (or at least recordable) and linked at the lowest level, we can audit (or backtrack) on all of the hundreds of decisions we make when we enter data into our software. We can also find the decisions of those from whom we import data. Using just the links I've listed above, there are nine relationships other data may have to an ASSERTION. With the important recursive power of ASSERTIONs (that is ASSERTIONs about ASSERTIONs) this becomes the fundamental tool for documenting the genealogical process and in strengthening genealogy software..." See GENTECH Genealogical Data Model above. [cache]
Lexicon Working Group (LWG). GENTECH and FGS (Federation of Genealogical Societies) created the Lexicon Working Group in 1995.
The GENTECH Genealogical Data Model
|Receive daily news updates from Managing Editor, Robin Cover.|