The Center for Humanities Information Technology at the University of Bergen has announced support for a two-year research project focused on markup for "complex" documents. The main goal of the MLCD Project (Markup Language for Complex Documents) is to "lay the theoretical foundation for a better system of representation for complex textual phenomena than can be found in today's SGML- and XML-based systems; the project will also lay a foundation for software development, with an eye to web-based delivery." Among the issues dealt with in MLCD: the well-attested phenomenon of "overlap" (non-hierarchical structures), handled by various methods in SGML and XML, e.g., 'store both structures, filter into SGML; use the SGML CONCUR Feature; use marked sections and entity declarations; use [TEI] milestones [= empty, asynchronous elements]; model as fragmentation; use stand-off markup'; etc. The MLCD project builds upon published research from C. M. Sperberg-McQueen and Claus Huitfeldt, especially "GODDAG, MECS, and TexMECS (an experimental markup meta-language for complex documents). MECS (Multi-Element Code System) "was developed by Claus Huitfeldt in connection with the work of the Wittgenstein Archive at the University of Bergen. MECS has many similarities to SGML-based systems, but distinguishes itself from them in that it has a simpler notation and a well-defined concept of well-formedness as a property separate from that validity. MECS thus anticipates many of the ways in which XML has modified the rules of SGML; in addition, MECS allows non-hierarchical structures in the form of overlapping elements. The MLCD project intends to define a system which combines the best of SGML/XML and MECS. A notation for such a system has already been designed, and a data structure has been sketched out. The project will work to complete the specification of the data structure and to develop some method of specifying document grammars."
From the project summary and background description:
The prevailing standard for text encoding today is SGML, which is the basis of HTML, the text format most commonly used today on the World Wide Web. SGML is also the basis of XML, which is a simplified subset of SGML which has been used to reformulate HTML (as XHTML) and which is in the process of becoming the standard form for Web publishing as well as for other applications. All SGML- and XML-based systems, however, have problems with the representation of a variety of phenomena which are essential for the acceptable representation and processing of text. These problems are in large part solved by MECS, a coding system developed by Claus Huitfeldt in work at the Wittgenstein Archive at the University of Bergen. MECS, however, has no well defined data structure and no notion of document grammar.
A system which combined the best features of SGML and MECS, providing a simple notation combined with a powerful grammatical formalism and a data model capable of representing non-hierarchical structures in a natural way, would represent a considerable step forward. Laying the groundwork for such a system has been the goal of a collaboration between Huitfeldt and C. M. Sperberg-McQueen which began in 1997-98, when Dr. Sperberg-McQueen was a visiting researcher at the historical-philosophical faculty in Bergen. Some results of that collaborative work have been published in the form of two articles; a third is in press. In brief, the work indicates that a notation and well-formedness rules based on MECS should be usable for the purpose. A suitable data model for MECS has also been successfully sketched. The relationship between SGML and this new system has been studied, with emphasis on possibilities of automatic conversion. What is lacking is a grammatical formalism which allows the expression of validity conditions...MLCD thus consists, for now, in the direct continuation of the theoretical work already begun with the establishment of a notation and a data structure. The next stage will be the working out of a grammatical formalism. In connection with that work, it will be desirable to implement a system prototype in the form of a validating parser and simple experimental conversion and analysis tools..."
- MLCD Overview
- MLCD Project web site
- MLCD Project Description [EN] (also in Norwegian)
- "TexMECS: An experimental markup meta-language for complex documents." By Claus Huitfeldt and C. M. Sperberg-McQueen. 25-January-2001, draft/incomplete 17-February-2001 (2001-05-10).
- Humanities Information Technologies Research Programme [Forskningsprogram for humanistisk informasjonsteknologi], University of Bergen
- Contact: Claus Huitfeldt (HIT Director of Research)
- "Markup Language for Complex Documents (Bergen MLCD Project)" - Main reference page.
- "SGML/XML and (Non-) Hierarchy" - Main reference section.