SGML: Journals of the Basque Administration

From Thu Mar 13 12:03:06 1997
From: Joseba Abaitua <>
Organization: Universidad de Deusto
Subject:      Re: DTD for legal documents


It might interest you to know about the LEGEBIDUNA project.

We've created a corpus of administrative (not legal) documentation:
Official Bilingual Journals of the Basque Administration (almost 10
million words in each lang. Basque and Spanish). We're now tagging
the texts, i.e. recognizing administrative and legal formulae and
terminology, and their distribution in the texts' structure. Our DTDs
are deduced from the tagged corpora, i.e first we tag the text, then
we construct the DTDs.

Similar experiments have been reported in "Automatic generation of
SGML content models", Electronic Publishing, vol.8:195-206, by Helena
Ahonen <>.

Also you can have a look to Keith Shafer's Fred parser for automatic
DTD creation in

For our project, we have a page in Spanish at:

Joseba Abaitua
Facultad de Filosofia y Letras,  Universidad de Deusto,  Apartado 1
E-48007 Bilbao || Tel: +34-4-4139092  (Ext. 2292) || Fax: +34-4-4458916