Subject: Re: DTD for legal documents (long)

>I've been asked to develop an application of SGML to an archive of legal
>documents (mostly sentences).
>Does anybody on this list knows about the existence of related projects
>(I already know about Corpus Legis Project), and the availability of any
>public DTD in this field?
>Thanks in advance for the help.
>Fabio Ciotti

Mr. Ciotti:

Here at the Center for Electronic Text in the Law at the University of
Cincinnati College of Law (CETL) we have been working with TEI SGML for
some time.  Our ultimate goal is to develop a TEI-conformant DTD with
which to encode legal documents.  We have been encoding documents that
might be described as quasi-legal, such as Reports from the United
Nations Human Rights Center and the Resolutions and Recommendations of
the Chiefs of State and Heads of Government of the Organization for
African Unity.  Our experience with these documents (which if not
"legal" in some formal or systematic sense are least highly "legalistic"
in format and attitude) has helped us a lot in clarifying just what we
want to do.  As a result of our work to date, I can at least pose you
the following questions that might help in your search:

1.  What is your goal in encoding the text?

If you intend to use SGML primarily as a publishing vehicle, there are
projects, such as the one at the Centre du recherche en droit public at
the University of Montreal: 
(This URL goes to the English page, there is also a French one.  Most of
the material on this site is available in French or English, but some of
their papers are available only in French) that are using SGML primarily
as a vehicle for Internet publication of current primary legal materials
(in this case court opinions and statutes).  In keeping with this goal
the CRDP folks have decided to use several DTDs which they have
developed (using ISO 12083 as a base, I believe).  The use of SGML
elements (which they carry over into the HTML that they deliver on the
Web) gives them the ability to do fielded searching of a very high
order.  If your goal is similar to theirs, you would do well IMHO to
review their DTDs.  There are several US legal publishers, including
Lawyers Coop in Rochester, N.Y. that have proprietary DTDs that are
aimed at facilitating paper and online publication in ways similar to
the CRDP.  There is also a proposal from Allette Systems in Australia for a basic legal
DTD that could be enhanced as desired.  I don't know how this project is

If you intend to use SGML to facilitate justice administration, there
are a number of projects at, the homepage
of our National Center for State Courts, which might prove of interest.
Many of these projects are in the planning stage.

There are also academic projects, such as the Studies in Scarlet project
from the Research Libraries Group , that are using SGML for a number of
tasks but are stopping short of trying to represent specifically legal

2.  What sort of elements would you find useful in a legal DTD?

Many projects for encoding legal texts seem to be translating the myriad
annotations found in most legal publications into special elements.  The
text itself is often left alone or marked up with paragraph/list
elements.  The main purpose of this approach seems to be to replicate
printed editions.  This merits of this approach in producing printed
text, or in producing electronic text that merely replicates the printed
text, are obvious.  Given the conservative (some would say hidebound)
attitudes of many lawyers, there is a market for this approach.

In some slight contrast to these goals, we at CETL want to learn how to
better represent legal knowledge by the encoding of legal text.  We feel
that encoding legal data in this fashion will produce electronic
versions of primary legal texts that will support better academic
research and also better legal practice applications.  My current work
is going on primarily with statutes and regulations, although I am also
interested in judicial opinions.   Our approach is governed by the
perception that to evaluate/understand a legal text, the first things
one needs to know is the jurisdiction in which the text is/was effective
as a rule.  Other needful pieces of information are the dates associated
with the status of the text as a norm.  Dates of passage, effectiveness,
repeal, etc. are examples.   I tend to want to have a set of global
attributes that encompass these things.

One of the things that I am quite aware of is the differences that exist
between civil law systems and our common law.  These differences have
implications for legal text markup.  I am working with a few likeminded
individuals to set up a TEI working group that would extend the TEI to
do a good job of encoding legal texts from either type of system.  I
would be happy to share further my thoughts on these things, but the
message grows too long.

I would enjoy talking further about this.

Nick Finke

[ed. note: In an earlier note recommending the CETL, there was a slight
error in the URL for CETL.  It should be:  ]

