[Mirrored from: http://www.hum.port.ac.uk/Users/ralph.cleminson/mss/report.htm]
.Concerning this conference we have
The conference, which included papers by thirty-five Slavists and information science specialists from ten countries, was organized and co-directed by David Birnbaum (Department of Slavic Languages and Literatures, University of Pittsburgh), Anisava Miltenova (Institute of Literature, BAN), Milena Dobreva (Institute of Mathematics, BAN) and Andrej Bojadzhiev (Ivan Dujchev Center for Slavo-Byzantine Studies, University of Sofia), and was sponsored by grants from IREX, the ACLS Joint Committee on Eastern Europe, and the Open Society Fund.
At the introductory session, David Birnbaum set the tone of the conference by calling for the creation of texts that meet the criteria of multiple use, portability, and preservation, and by stressing the need for free software and standardization of character sets, in order to make computer text collations and other programs accessible over the Internet worldwide to all Slavists, free of charge, regardless of the type of computer, fonts, and word-processing software they use. The conference included sessions on encoding problems, text processing, fonts and image processing, data bank management systems, and computer support for specific manuscript projects in progress.
One striking feature of the conference was the extent to which computers are already being used by Slavists in both Eastern and Western Europe for research projects involving the analysis of early Slavic texts; indeed, judging from the papers presented, it appears that American Slavists as a whole are well behind our European colleagues in this area. Representative papers included "Informational and Presentational Units in Early Cyrillic Writing" (David Birnbaum), on the decisions involved in the encoding of orthographic vs. paleographic units; "A Church Slavonic Alphabet for Reprinting Old Manuscripts Using a Microcomputer" (Mimoza Majstorska, Skopje), demonstrating the application of an experimental optical character recognition system; "Computer Processing of Old Church Slavonic Manuscripts: Results and Prospects" (Zdenko Ribarova, Institut za Makedonski Ezik, Skopje, and Kiril Ribarov, Charles University, Prague), on issues of standardization, lemmatization and semantic analysis in the integration of the lexicon of Old Church Slavonic manuscripts into a dictionary; and discussions of computer-supported research projects in progress, including, among others, "Computer-Aided Analysis of the Macrostructure and Typology of Medieval Slavic Miscellanies" (Anisava Miltenova), and "The Application of Text Encoding Initiative (TEI) Guidelines to Encoding a Fourteenth-Century Serbian Church Slavonic Psalter" (Mary MacRobert, Oxford University). Special workshop sessions also provided demonstrations of collation and concordance software, including Collate, TUSTEP, and KLEIO.
The focal point of the conference was an outstanding afternoon-long workshop, led by David Birnbaum, Anisava Miltenova, Harry Gaylord (Groningen University), Winfred Bader (University of Tübingen) and Nicholas Finke (University of Cincinnati), on the guidelines set by the international Text Encoding Initiative project (TEI) for a standard format for the preparation and interchange of machine-readable texts for humanities research. (TEI was established in 1987 in Poughkeepsie, New York, and is sponsored by the Association for Literary and Linguistic Computing, the Association for Computational Linguistics, and the Association for Computers and the Humanities.) A major part of the workshop was a minicourse on the basic features of Standard Generalized Mark- Up Language (SGML), which has been adopted by TEI for the encoding of texts. The conference participants later had an opportunity to practice encoding a sample text at the hands-on SGML Editor computer workshop which concluded the conference.
Of course, as was stressed throughout the conference, computer encoding of manuscripts is not intended to replace firsthand examination of the physical manuscript, both because manuscripts inherently contain codicological, paleographic and other information that cannot easily be described, and because any sort of encoding or transcription necessarily involves subjective interpretation and decision-making on the part of the encoder. The purpose of computer encoding of manuscripts is not to replicate the manuscript per se, but rather to facilitate the analysis both of individual manuscripts and of manuscript corpora by tagging textual, structural and other features of each manuscript, in order to link them to categories that will then define specific types of computer searches of the database. Once a standardized program has been created for analysis of particular text corpus, an advantage to making it available to other scholars over the Internet is that it can be continually refined and expanded as other scholars working in similar areas contribute to it further data and amendments.
The conference was an important event for Slavists, and, as its title suggests that it is the first in a series, I look forward to a follow-up conference in the near-future, at which Slavists and computer specialists might be able to meet to establish concrete guidelines on the standardization issues for encoding Slavic material that were raised last year. I strongly recommend that all American Slavists who work with medieval texts take advantage of the opportunity to attend future conferences, in order to gain hands-on practice with state-of-the- art text encoding methods and software, to exchange information with other Slavists working on similar research projects with computer support, and to work together with our European colleagues and information science specialists in establishing standardization guidelines for the computer processing of medieval Slavic texts, as it will be much to our benefit to contribute to these discussions.
The conference abstracts are published in D. J. Birnbaum, A. T. Bojadzhiev, M. P. Dobreva, and A. L. Miltenova, eds., Computer Processing of Medieval Slavic Manuscripts: First International Conference, Blagoevgrad 1995. Abstracts, Sofia: Institute of Literature, BAN, 1995 (ISBN 954-8712-02-4). Conference papers will appear in a volume scheduled for publication in early 1996.
In September 1987, a conference was to have taken place at the Catholic University of Nijmegen concerning the International Data Base for Medieval Manuscripts studies. Unfortunately, it did not take place. Participants' papers were published in Polata knigopisnaja, December 1987, 17-18.
Earlier in the same year, efforts were being made to enhance the PCC software with a data base system by the Centre international d'information sur les sources de l'histoire balkanique et méditerranéenne (CIBAL) in Sofia, Bulgaria. The proposed software description was described thoroughly in the above mentioned volume of Polata knigopisnaja. Despite the expectations of paleoslavists, this theoretical platform has never been fully applied in practice. The data base system used in CIBAL is compiled in the framework of ISIS library software and contains a restriction regarding the size of information in each rubric. Another disadvantage of this tool was that it was not possible to provide different types of linking between a segment in the same file and external information. That is why the analytic description of Slavic manuscripts has never really taken place. In the last two decades the Institute of Russian Language took the initiative of creating the so called "Mashinnyi fond russkogo jazyka" (Electronic archive of Russian language) -- electronic files of Russian language made mostly for lexicographic purposes. It is not popular elsewhere and is based on a local platform. Nowadays there is a wide range of commercial software used directly or with some modifications by many paleoslavists. Many examples abound. A completely commercial tool is offered by Reinhard Lampe (TSAR, Heidelberg, Germany). It is based on the old word processing software T3 with included bitmap fonts for old and new Cyrillic, and Glagolitic alphabets. Unfortunately, this modification is not only very expensive but is also non-user-friendly. The other data base product (IS T) created by R. Lampe and used by T. Chertorickaja in 1994 (Vorläufiger Katalog Kirchenslavischer Homilien des beweglichen Jahreszyklus. Aus Handschriften des 11-16 Jahrhunderts vorwiengend ostslavischer Provenienz. 1994) is not related to T3. Its shortcomings include 1) the inability to use Old Cyrillic and Greek, 2) Old Russian texts must be transliterated into Modern Russian (without superscripts, abbreviations and titlos), and 3) the system displays a minimal opportunity of encoding a repertoire of medieval Slavic texts.
Thus, the situation in the field of medieval Slavic manuscript studies is quite mixed. There exist different hardware platforms as well as a wide range of software, not to mention the plethora of terminology and traditional topics of manuscript description used by slavists from different countries. There exists no complete coordination between slavists and specialists in the fields of Latin, Greek and Hebrew paleography and codicology. This lack of coordination has led scholars to overlook the necessity of developing systems for Slavic studies which must be compatible with already existing systems for these other fields. The goal of the project "Computer Supported Processing of Old Slavic Manuscripts " by Prof. David Birnbaum (University of Pittsburgh), Andrej Bojadzhiev (University of Sofia), Milena Dobreva (Institute of Mathematics - Bulgarian Academy of Sciences) and Anisava Miltenova (Institute of Literature - BAS) is based on the following principles: 1. Standardizing of document file formats; 2. Multiple use (data should be separated from processing); 3. Portability of electronic texts (independence of local platforms); 4. Necessity of preservation of manuscripts in electronic form; 5. Orientation to the well-structured divisions of data according to the Slavic traditions of orthography, textology, paleography, etc. The team deemed that the Standard Generalized Markup Language (SGML) and its applications provided by Text Encoding Initiative (TEI) corresponds most adequately to these principals. The conference in the framework of this project (held in Blagoevgrad, 24-28 July, 1995) exceeded our expectations. Due to the efforts of Prof. David Birnbaum (including his two presentations: "How Slavic Philologists Should Use Computers" and "Informational and Presentational Units in Early Cyrillic Writing", as well as his organizational and active participation in the TEI workshop), and the efforts of TEI instructors Dr. Harry Gaylord (Groningen), Dr. Nicholas Finke (Cincinnati), and Dr. Winfred Bader (Tübingen), most of the participants were in agreement with the principles outlined above. The philosophy of SGML helped to settle some well known misunderstandings among paleoslavists concerning philological questions of terminology, inventory of units, character sets and data structure.
We think the conference was very successful for TEI too. A general agreement was reached to analyze and describe in the future all medieval Slavic manuscripts according to TEI and SGML methodology.
Andrej Bojadzhiev
Anisava Miltenova
Sofia, Bulgaria
I. Computer-supported research and teaching in the humanities has been growing at an increasing pace over the past two decades, with new methods for using computers to increase productivity in these areas, and new types of software appearing continuously. Unfortunately, the field of Slavic studies (and especially medieval Slavic studies) has lagged well behind other areas of the humanities in applying modern information technology to improving research, scholarship, and teaching. The absence of well-developed computational standards and resources in Slavic studies results largely from the peculiarities of our discipline, which have made it impossible simply to plug Slavic data into systems developed for studying other cultures. As this conference has demonstrated, these difficulties are not insurmountable. The difficulties in question can be addressed and resolved through the active collaboration of Slavic philologists and information science specialists who understand the requirements of humanities computing. While there is a real danger that the field of Medieval Slavic studies will continue to remain isolated from modern research and teaching methods, the present conference has demonstrated that an opportunity exists to bridge the gap between Slavic studies and developments elsewhere in humanities computing. The papers included in this volume provide evidence of substantial results both in standardizing the computational treatment of Slavic materials and in generating significant products suitable for distribution to researchers and students. Because the manuscript sources with which we work are basic tools in both the social sciences and the humanities, the results of our efforts ultimately will lead to increased access to research and teaching materials in both of these broad disciplinary areas.
One aspect of the explosive growth of computer-supported research and teaching in the humanities, aided particularly by the development of hypermedia technology, has been the collection of large text corpora for research purposes, including such significant, widely- accessible, large-scale projects as the Greek Thesaurus Linguæ Græcæ and Perseus Project. Some of these developments are described by Linda W. Helgerson in "CD-ROM and Scholarly Research in the Humanities", Computers and the Humanities, vol. 22 (1988), 111-16, and in The Humanities Computing Yearbook 1989-90, ed. I.Lancashire, Oxford: Clarendon Press, 1991 (passim). However, as is noted above, the field of Slavic studies, and especially medieval Slavic studies, lags well behind other areas of the humanities in the application of modern information technologies to research, scholarship, and education. This is readily apparent from the modest "Slavic Languages" section of the aforementioned Humanities Computing Yearbook (by David J. Birnbaum and Harry Gaylord, pp. 300-29).
While Greek scholars, who also have to deal with alphabetic peculiarities, have produced the major projects mentioned above, there are no comparable resources in medieval Slavic studies. This disparity can be attributed to at least four sets of reasons:
II. Over seventy persons from eleven countries gathered in Blagoevgrad in the last week of July, 1995, in an effort to examine these problems and propose solutions to them. The present volume is intended to make the results of that conference accessible to a wider audience. In the "State of the Art" session, David J. Birnbaum (Pittsburgh) "How Slavic Philologists Should Use Computers" emphasized the importance of multiple use, structure, portability, preservation, and standardization in the development of document-encoding systems for Slavic philological research. Winfried Bader (Tübingen), "Bible and Computer: A Brief History of the Last Ten Years" contrasted one of the oldest uses of computer-based text studies (biblical scholarship) with one of the youngest (Slavistic scholarship). Elena Paskaleva and Milena Dobreva (Sofia) "New Tools for Old Language: Computer Processing of Bulgarian Texts" surveyed both the problems involved in the use of computers to process early Slavic texts and some of the attempts that have been made to resolve these problems, particularly in Bulgaria.
In the "Encoding Problems" session, David J. Birnbaum (Pittsburgh) "Informational and Presentational Units in Early Cyrillic Writing" discussed theoretical and methodological prerequisites for developing character- and glyph-encoding systems that are flexible and powerful enough to satisfy the varying requirements of different Slavic researchers, while simultaneously providing a standardized framework capable of generating portable documents. Karsten Grünberg (Heidelberg) "Transcription Rules for Old Church Slavonic Writing" examined transcription rules for representing the information in early Slavic manuscripts, and demonstrated some of the results achieved through the use of TUSTEP to process transcribed material in a practical research environment. Sebastian Kempgen (Bamberg) "Implementing a Medieval Script on a Personal Computer" surveyed both general theoretical considerations and concrete implementation problems of font development in current environments.
In the "Text Processing: Problems, Methods, and Tools" session, Mary MacRobert (Oxford) reported on her application of the Text Encoding Initiative (TEI) guidelines to encoding a fourteenth-century Serbian Church Slavonic Psalter, drawing particular attention to such problems as concurrent hierarchies and overlapping elements. Aleksandr Moldovan (Moscow) "O sozdanii fonda russkogo jazyka XI-XII vv. v Institute russkogo jazyka RAN" stressed the importance of coordination to avoid duplication of effort, and reported on theoretical and practical issues affecting large-scale encoding projects currently under way at the Russian Language Institute of the Russian Academy of Sciences. Irina Azarova (St. Petersburg) "Problemy i metodika izdanija slavjanskogo perevoda Biblii" reported on corpus-based research on fifteen hundred early Slavic biblical manuscripts currently being conducted by the Russian Bible Society.
In the "Manuscript Collation" session, Michael Bakker (Amsterdam) " Computer Collation of Manuscript Transcriptions" discussed theoretical and practical issues involved in using Collate, a commercial Macintosh-based program developed in England for collating manuscript witness, in a large-scale Slavistic project. Tanja Ivanova, Nina Shojleva, and Irina Bolcheva (Sofia) "Advanced Searching" demonstrated the use of Advanced Searching, a shareware program developed in Sofia to provide Collate-like capabilities in a Microsoft Windows environment, in their research on the textual transmission of Chernorizec Xrabr's O pismenex.
In the "Fonts and Image Processing" session, Mimoza Majstoroska (Skopje) "A Church Slavonic Alphabet for Reprinting Old Manuscripts Using a Microcomputer" discussed the graphic analysis of early Cyrillic writing and demonstrated the application of original optical character recognition software based on this analysis to medieval manuscript materials. Ljupcho Mitrevski (Skopje) "Fonts, Character Sets, and Church Slavonic Graphemes" discussed keyboard layout and other user-interface issues involved in editing early Cyrillic texts on a computer. Ivelin Stojanov (Sofia) also discussed the development of optical character recognition systems for early Slavic manuscript materials. William R. Veder's "Transcription and Edition" emphasized the importance of developing portable file formats and encoding methods to support collaborative work in Slavic philology.
In the "Problems in Running a Concrete Project: Computer-Supported Processing of Medieval Slavic Manuscripts" session, Anisava Miltenova (Sofia) "Computer-Aided Analysis of the Macrostructure and Typology of Medieval Slavic Miscellanies" discussed both philological and informatic aspects of studying early Slavic manuscripts on a computer, with particular attention to the structural analysis of mixed- content miscellanies. Andrej Bojadzhiev (Sofia) "Paleographic and Orthographic Features in the Description of Slavic Manuscripts" addressed the identification and encoding of parameters for describing codicological, paleographic, orthographic, and linguistic features of early Cyrillic manuscript materials. Both papers in this session were presented within the framework of the Text Encoding Initiative (TEI) application of Standard Generalized Markup Language (SGML).
In the "Applications of Data Base Management Systems" session, Stana Jankoska, Dragan Mixajlov, and Ljupcho Josifoski (Skopje) "A Database for Ancient Slavic Manuscripts in Macedonia" outlined the parameters used for describing early Slavic manuscripts in conformity with UNIMARC and other international cataloguing standards. Zdenka Ribarova (Skopje) and Kirill Ribarov (Prague) "Computer Processing of Old Church Slavonic Manuscripts: Results and Prospects" described the powerful STINO system, which is used in the Prague Old Church Slavonic dictionary project for database management and lexical analysis. Reinhard O.Lampe (Heidelberg) "Problems Involved in Setting Up a Unified Comprehensive Catalogue of Slavic Texts" described the management of multilingual data from a computational perspective. Jurij Labyncev (Moscow) "Mezhdunarodnyj issledovatelskij proekt 'Slavia Orthodoxa et Slavia Romana. Vzaimodejstvie slavjanskih mirov: Duhovnaja kul'tura Podljash'ja'" outlined the plans for a broad-ranging encyclopedic study of materials in all languages that pertain to both Orthodox and Roman culture. This report on the general intellectual organization of the project set the background for Larisa Shchavinskaja's technical report on the use of relational and source-oriented databases (such as Paradox and KLEIO) and statistical analysis software in its implementation.
In the "Case Studies: Types of Manuscripts" session, Monia Camuglia (Pisa) "The Psalter: From the Oral Tradition to the Informatic One" described the application of the Textual Data Base (dbt) program to an analysis of textual variation of fifteen psalms in seven psalters. This report addressed issues involved in configuring dbt to work with early Cyrillic materials, and also the use of the dbt query system to analyze and navigate the text. Cynthia Vakareliyska (Eugene) "Medieval Slavic Menologies On Line" outlined information retrieval desiderata for a menological database. Emilija Gergova (Sofia) "Computational Analysis of Hymnographic Compendia" discussed both philological and computational problems involved in studying the multilayered structure of hymnographic sources, with particular attention to the October Menaion. Marija Schnitter (Plovdiv) "A System for Encoding Euchological Texts within the Parameters of the IST Computer Program" examined criteria for assigning cataloguing numbers to euchological texts, based on her analysis of some eighty-five mostly South Slavic records from the eleventh through eighteenth centuries. Francis R. Salter (Bracebridge) "Some Suggestions Arising from a Personal Computer Analysis of the Svetostefanska Xrisovulja" reported on a project undertaken during the early stages of development of computer support for Church Slavonic writing, and addressed issues of programming and data structures required to encode and manipulate the text both at that time and from a modern perspective.
In the "Multimedia" session, Dragomir Petrov and Elena Koceva (Sofia) reported on the development of the UNESCO-supported St. Sofia CD-ROM project (in collaboration with Velina Bratanova), which provides text, images (including medieval manuscripts), and audio material concerning all aspects of the development of early Bulgarian culture.
The final conference session was devoted to KLEIO and included presentations by both historians and philologists. Viktorija Tjazhel'nikova (Moscow) "A Russian Version of the Source-Oriented Software KLEIO: New Opportunities for the Treatment of Cyrillic Sources" outlined the features of KLEIO that support the encoding of structure and semantics together with physical data. Vladimir Tixonov (Moscow) "Interactive Full-Text Processing in KLEIO: Scanning Criteria of Basic Terms (On Appeals by Disenfranchised People)" reported on the practical application of KLEIO to the scanning and content-based analysis of appeals by people whose civil rights had been revoked in early Soviet Russia (lishentsy). Igor Jushin "A Model of Historical Sources in KLEIO as a Basis for Integrated Social Classification" outlined several traditional problems involved in the computational analysis of systems of social classification, and discussed how a source-oriented system such as KLEIO addresses them. Nadezhda Romankova (Smolensk) discussed the use of statistical analysis to resolve the authorship of works possibly attributable to Kliment Oxridski. Adelina Angusheva (Sofia) "A Computer Investigation of Medieval Prognostic Books by Means of KLEIO" discussed the use of KLEIO to compare kalendologia in Byzantine, Latin, and Slavic written culture. Margaret Dimitrova (Sofia) "Greek and Latin Loanwords in the New York Missal and the KLEIO Computer Program" demonstrated how KLEIO can be used to generate indices of loan words and lists of their phonetic and morphological peculiarities, which can then be subjected to further study.
III. Some of the most interesting and promising parts of the conference are not reflected in this volume because they took the form of demonstration workshops, rather than formal reports. Winfried Bader (Tübingen) demonstrated the capabilities of TUSTEP for philological research, and William Veder and Michael Bakker (Amsterdam) demonstrated Collate and other Macintosh-based tools. Hands-on demonstrations were also held for Advanced Searching and two CD-ROM products, the St. Sofia Project and the Bulgarian National Library's "Unique Balkan Manuscripts" album of Slavic and other manuscript folios. Winfried Bader (Tübingen), David J. Birnbaum (Pittsburgh), Nicholas Finke (Cincinnati), Harry Gaylord (Groningen), and Anisava Miltenova (Sofia) conducted a half-day Text Encoding Initiative (TEI) workshop, followed by a separate hands-on session that provided the participants with an opportunity to use SGML editing software.
IV. A general consensus emerged among many participants at the conference that SGML, particularly in its TEI implementation, provides a framework that is potentially capable of satisfying the varying needs of different researchers, while at the same time affording a common, standardized model for document architecture and character and glyph encoding. Several conference participants engaged in significant text-encoding projects agreed to combine their efforts to adapt the TEI guidelines to the specific needs of Slavic philological research. Other concrete results of the conference were the immediate establishment of an anonymous ftp archive for early Cyrillic texts and related electronic files (ftp.pitt.edu in dept/slavic, maintained by David J. Birnbaum (Pittsburgh) and plans for publishing a World Wide Web page (maintained by Ralph Cleminson (Portsmouth). Colleagues from several countries volunteered to contribute electronic texts to the ftp archive, and we hope that this archive and World Wide Web page will provide long-term, stable, centralized locations for the collection and dissemination of relevant information and materials.
V.Publication costs required that the volume of conference papers be printed from camera- ready copy supplied by the individual authors, without further editing, which has led to considerable variation in terminology, style, systems of transliteration, and other editorial details. The editors consider this inconsistency a valuable illustration of the importance of developing standardized methods and systems for document encoding and processing. Certain papers presented at the conference are not included in the volume because camera-ready copy was not received in time; these papers are represented instead by reprints of abstracts prepared by the authors as part of the original conference program.
VI.The publication of the conference papers, and the costs of the conference itself, were supported by grants from the International Research and Exchanges Board, with funds provided by the U.S. Department of State (Title VIII ) and the John D. and Catherine T. MacArthur Foundation, by the Joint Committee on Eastern Europe of the American Council of Learned Societies, by the Open Society Fund, and by the Soros Center for the Arts (Sofia). None of the these organizations is responsible for the views expressed.
The Organizing Committee (David J. Birnbaum (Pittsburgh), Andrej Bojadzhiev (Sofia), Milena Dobreva (Sofia), and Anisava Miltenova (Sofia) would also like to acknowledge the generous support of the Bulgarian Academy of Sciences, the University of Pittsburgh, the University of Sofia, and the American University of Blagoevgrad. Author/Editor Software for the SGML hands-on session was contributed by SoftQuad, Inc. (info@sq.com).