[Mirrored from: http://www.epas.utoronto.ca:8080/~decaen/hsei/what.html]
return to home page
What is HSEI?
At present there does not exist a freely available syntactic database
and corresponding search engine for the Biblical Hebrew and Aramaic
syntactician and discourse analyst. Further, there is no standard,
universally available database whereby the scientific community can
repeat and verify the results of such study. Such a database would
permit the researcher to make comprehensive statements about the
behaviour of Biblical Hebrew syntax and textgrammar. Since it is first
and foremost a research tool, a fundamental requirement is that the
data and analysis are completely accessible and configurable by the
researcher to reflect varying theories and improved understanding of
the text and theories used to investigate the text.
An ad hoc committee has been formed to extend the Westminster
Morphologically Analyzed Machine-Readable Hebrew Bible (MORPH) to the
syntactic level. The committee was formed loosely under the auspices
of the Computer-Assisted Research Group (CARG) of the Society of
Biblical Literature (SBL). At present there exists only an executive
committee (with one associated researcher) consisting of:
see also HSEI Members
To meet this need, the following is required:
- A morphologically and syntactically analyzed Hebrew Bible.
- A minimalist theory of syntax which will allow the construction
of a set of tags which does not constrain the researcher unduly.
- The analysis should be as theory-neutral as possible.
- Software which allows the researcher to query the data in many
different ways.
- The database and associated software should allow the researcher
to reconfigure both the syntactic analysis and manner of reporting.
- All data should be as unencumbered with intellectual property
rights as possible; ideally there should be no more than the copyright
and royalty already implicit in MORPH.
- All software tools used to create and manipulate the resulting
database will be licensed under the GNU General License ("copyleft").
- A new version of MORPH which includes a minimum set of syntactic
tags and hooks for external additions (to be supplied by the user).
- A search and query engine for accessing the data.
- A set of software tools allowing the researcher to modify the
tagging system.
- A report generating tool (e.g., to display syntactic trees and
other structures designed by the researcher.
- Documentation for the system.
- A set of technical and formal reports on the problems, issues,
theory and methodology involved in the project.
What we have is the Westminster Morphologically Analyzed
Machine-Readable Hebrew Text (MORPH). Our task is then to use this
base to bootstrap ourselves to the next level. What we need is a
linguistic, theory-neutral, syntactically tagged text. The problem
gets worse because we are dealing with two orders of ambiguity: the
ambiguity which comes from the inherent nature of language (i.e., a
sentence can be understood in more than one way), and the ambiguity
which comes from the fact that we are on the frontiers of our
knowledge and we simply do not know the best way to proceed.
The best software parsers we have are context-free. What we need is a
context-bound parser. This is a non-trivial task which, in other
(natural) languages and in the AI fields such as machine translation,
is accomplished through exceptions and human intervention. Formal
grammars become quite complex. The idea is to reduce the ambiguity and
complexity of parsing by preprocessing the text and giving it
syntactic tags in addition to the already-existing morphological ones.
So the project entails:
- a partially tagged data file(s)
- a front-end interface
- some sort of search engine
Right now, Kirk Lowery is looking at a solution using the Text
Encoding Initiative's SGML DTD (document type description) and marking
up MORPH; then, using and hacking a SGML parser (like SGMLS), creating
the syntactic tagged text, leaping off MORPH. Then the SGML parser can
be used to create a query language (or maybe tcl [Tool Control
Language]) and attach it to the (probably GUI) interface. Other SGML
tools exist which can actually be used to quickly build applications
based upon a DTD. That is the theory.
Why TEI's DTD? Because it is a standard. SGML DTDs are reconfigurable
either by changing the DTD itself (which changes the meaning of the
tages already present) or by changing a parser which would actually
reconfigure the data files in some arbitrary manner. This approach -
if successful - satisfies many of the requirements of HSEI.
Then we are ready to get back to the original task: researching the
nature of Biblical Hebrew syntax, syntax in general, and finally, of
discourse.
The process of establishing an archive linked to the home page is
under way. A content model of MORPH is also being created, which is
preliminary to modifying and applying TEI's DTD. For more information,
contact Kirk Lowery.
decaen@epas.utoronto.ca