[Mirrored from: http://www.epas.utoronto.ca:8080/~decaen/hsei/what.html]

return to home page

What is HSEI?


Concept

At present there does not exist a freely available syntactic database and corresponding search engine for the Biblical Hebrew and Aramaic syntactician and discourse analyst. Further, there is no standard, universally available database whereby the scientific community can repeat and verify the results of such study. Such a database would permit the researcher to make comprehensive statements about the behaviour of Biblical Hebrew syntax and textgrammar. Since it is first and foremost a research tool, a fundamental requirement is that the data and analysis are completely accessible and configurable by the researcher to reflect varying theories and improved understanding of the text and theories used to investigate the text.

Initiative

An ad hoc committee has been formed to extend the Westminster Morphologically Analyzed Machine-Readable Hebrew Bible (MORPH) to the syntactic level. The committee was formed loosely under the auspices of the Computer-Assisted Research Group (CARG) of the Society of Biblical Literature (SBL). At present there exists only an executive committee (with one associated researcher) consisting of:

see also HSEI Members


Project Requirements

To meet this need, the following is required:

  1. A morphologically and syntactically analyzed Hebrew Bible.
  2. A minimalist theory of syntax which will allow the construction of a set of tags which does not constrain the researcher unduly.
  3. The analysis should be as theory-neutral as possible.
  4. Software which allows the researcher to query the data in many different ways.
  5. The database and associated software should allow the researcher to reconfigure both the syntactic analysis and manner of reporting.
  6. All data should be as unencumbered with intellectual property rights as possible; ideally there should be no more than the copyright and royalty already implicit in MORPH.
  7. All software tools used to create and manipulate the resulting database will be licensed under the GNU General License ("copyleft").


Desired Outcomes

  1. A new version of MORPH which includes a minimum set of syntactic tags and hooks for external additions (to be supplied by the user).
  2. A search and query engine for accessing the data.
  3. A set of software tools allowing the researcher to modify the tagging system.
  4. A report generating tool (e.g., to display syntactic trees and other structures designed by the researcher.
  5. Documentation for the system.
  6. A set of technical and formal reports on the problems, issues, theory and methodology involved in the project.


Discussion

What we have is the Westminster Morphologically Analyzed Machine-Readable Hebrew Text (MORPH). Our task is then to use this base to bootstrap ourselves to the next level. What we need is a linguistic, theory-neutral, syntactically tagged text. The problem gets worse because we are dealing with two orders of ambiguity: the ambiguity which comes from the inherent nature of language (i.e., a sentence can be understood in more than one way), and the ambiguity which comes from the fact that we are on the frontiers of our knowledge and we simply do not know the best way to proceed.

The best software parsers we have are context-free. What we need is a context-bound parser. This is a non-trivial task which, in other (natural) languages and in the AI fields such as machine translation, is accomplished through exceptions and human intervention. Formal grammars become quite complex. The idea is to reduce the ambiguity and complexity of parsing by preprocessing the text and giving it syntactic tags in addition to the already-existing morphological ones.

So the project entails:

  1. a partially tagged data file(s)
  2. a front-end interface
  3. some sort of search engine

Right now, Kirk Lowery is looking at a solution using the Text Encoding Initiative's SGML DTD (document type description) and marking up MORPH; then, using and hacking a SGML parser (like SGMLS), creating the syntactic tagged text, leaping off MORPH. Then the SGML parser can be used to create a query language (or maybe tcl [Tool Control Language]) and attach it to the (probably GUI) interface. Other SGML tools exist which can actually be used to quickly build applications based upon a DTD. That is the theory.

Why TEI's DTD? Because it is a standard. SGML DTDs are reconfigurable either by changing the DTD itself (which changes the meaning of the tages already present) or by changing a parser which would actually reconfigure the data files in some arbitrary manner. This approach - if successful - satisfies many of the requirements of HSEI.

Then we are ready to get back to the original task: researching the nature of Biblical Hebrew syntax, syntax in general, and finally, of discourse.


Current Status

The process of establishing an archive linked to the home page is under way. A content model of MORPH is also being created, which is preliminary to modifying and applying TEI's DTD. For more information, contact Kirk Lowery.
decaen@epas.utoronto.ca