The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: October 25, 2000
Electronic Text Corpus of Sumerian Literature (ETCSL)

[October 21, 2000] ETCSL project description: "The aim of this project is to produce a 'collected works' of over 400 poetic compositions of the classical [Sumerian] literature, equipped with translations. This standardised, electronically searchable SGML corpus, which is based to a large degree on published materials, comprises some 400 literary compositions of the Isin/Larsa/Old Babylonian Period, amounting to approximately 40,000 lines of verse (excluding Emesal cult songs, literary letters, and magical incantations). The full catalogue can be found elsewhere at this site. The compositions are presented in single-line composite text format (in a standardised transliteration) with newly-prepared English prose translations, and a full bibliographical database, thereby making available for the first time a collected works of Sumerian literature. The corpus is freely available to anyone who wishes to use it via this World Wide Web site... The literature written in Sumerian is the oldest human poetry that can be read, dating from approximately 2100 to about 1650 BC. The main 'classical' corpus can be very roughly estimated at 50,000 lines of verse, including narrative poetry, praise poetry, hymns, laments, prayers, songs, fables, didactic poems, debate poems and proverbs. The majority of this has been reconstructed during the past fifty years from thousands of often fragmentary clay tablets inscribed in cuneiform writing. Relatively few compositions are yet published in satisfactory modern editions. Much is scattered throughout a large number of journals and other publications. Several important poems must still be consulted in twenty-year-old unpublished doctoral dissertations, some with translations which have now become unusable because of progress in our knowledge of the language. Major compositions have not yet been edited at all. The slow progress of research, with little organised collaboration until recently, means that Sumerian literature has [hitherto] remained inaccessible to the majority of those who might wish to read or study it, and virtually unknown to a wider public."

"The principal task is the preparation of a complete electronic corpus of reconstructed texts. These are being encoded in Standard Generalized Markup Language (SGML), which will ensure the widest accessibility of the material in the future. The corpus comprises: (1) an information database; (2) transliterations of 13 ancient literary catalogues; (3) composite texts of 409 literary compositions; (4) new translations of all the composite texts. The emphasis is on coherent, readable English prose. These will enlarge immensely the accessibility and usability of the corpus by scholars not conversant with Sumerian, including both archaeological and art-historical specialists within the field and comparative and cultural historians from outside it." Principal researchers on this project Dr Jeremy Black, Dr Graham Cunningham, Dr Gábor Zólyomi, and Dr Eleanor Robson.

"SGML, the Standard Generalised Markup Language, is an international standard (ISO 8879: 1986) for writing tagging languages which describe the structure, rather than the visual appearance, of texts. SGML works by means of Document Type Definitions (DTDs) which prescribe the order, hierarchy and frequency of the elements of a text, and the writing system used. It is particularly useful for ensuring structural consistency throughout a large body of material, and for systematically tagging noteworthy or interesting features of those texts. Because it is an international standard and not a proprietary format, SGML is independent of platform, application and character-set and therefore extremely portable and durable. In short, it is ideal for encoding large language corpora which need to be searched, analysed and shared between projects over a long period of time. Hyper-Text Markup Language, in which Web documents are written, is probably the best-known SGML application; its current DTD is HTML 4.0. There are many other internationally or professionally standard DTDs but, not surprisingly, nothing quite suitable for marking up a corpus of Sumerian literature. The corpus project constructed a set of suitable DTDs; these were tested on a sample text, The lament for Nibru, which has a composite text of 323 lines, and 33 cuneiform sources... HTML is derived from SGML, and current trends are moving towards a Web based on XML -- Extensible Markup Language, a simplified form of SGML -- with XML-aware browsers already commercially available. In the next few months a parallel XML- and Unicode-based site will be developed, enabling much more sophisticated display and searching facilities. However, until their use becomes more widespread, the simple HTML- and ASCII-based pages which you are now reading will continue to be available... By the end of the project we also hope to offer a choice between HTML- and ASCII-viewing and XML- and Unicode-viewing." [from 'About ETCSL', dated May 19, 2000]


  • ETCSL Web site

  • ETCSL catalogue of Sumerian literary compositions. In the typical case, the catalog references an online composite text, a translation, and bibliography. "The catalogue is thematically arranged and each composition has a number of which the first element reflects the broad area of the literature to which it belongs: ancient literary catalogues; narrative and mythological compositions; royal praise poetry and hymns to deities with prayers for rulers; hymns and cult songs (mostly hymns addressed to deities); scribal training literature; proverbs, fables and riddles.

  • Technical information Editing protocols and DTDs

  • Project Announcement. University of Oxford Centre for the Study of Ancient Documents, Newsletter No. 7 (Winter 1998/1999). With photo. [cache]

  • Conference presentation on 'The Electronic Text Corpus of Sumerian Literature', by Jeremy Black and Eleanor Robson. Saturday October 9, 1999. The presenters described their methodology, focusing on the creation of SGML document type definitions, the development of an Operating Procedure for the project, and issues of transliteration and translation. They also discussed some of the problems encountered and plans for the future. "Black presented the philological and pedagogical rationale for the project, while Robson discussed its operating procedure. This procedure involves the use of SGML tags and a simple wordprocessing macro interface for the entry and markup of transliterated texts by Sumerologists, and hence a minimum of custom software development. Robson showed the project's Web browser interface via an online Internet connection, emphasizing the project's use of basic HTML generated from the underlying SGML version of the texts. Because the intended audience goes beyond scholars at major research universities, users of the electronic Sumerian text corpus should not and do not need the latest version of Web browser software running on the fastest computers with high-speed Internet connections in order to use the texts effectively." See "Electronic Publication of Ancient Near Eastern Texts," by Charles E. Jones and David Schloen. In Ariadne [ISSN: 1361-3200] Issue 22 (December 1999). 'A report on a Chicago conference which explored XML tagging for Ancient Near Eastern Texts on the Web.'

  • "AHRB Funding for Sumerian Literature Online." - "Over the last three years, a group of researchers at the University's Oriental Institute have been creating a universally available textual corpus of Sumerian literature. This is the oldest human poetry that can be read, dating from approximately 2100 to 1650 BC. The group have now been awarded £472,000 by the Arts and Humanities Research Board to carry out a comprehensive literary and historical investigation using a wide range of corpus linguistics techniques...The project will enable researchers to define genres and personal styles in a literature which is largely anonymous, and facilitate cultural and historical research by enabling comparison of all the contexts where a particular word or phrase occurs, as an aid to exploration of the semantic content of basic social concepts. It also offers potential for interdisciplinary interest, as historically a rich stream of survivals have flowed on through Babylonian literature, mediated by translations into other languages and by oral transmission into ancient Indian, Arabic, and Greek civilisation and from there into the European tradition..."

  • Electronic Text Corpus of Sumerian Literature Project overview from a public relations document. With photo of clay cylinder (c. 2150 BC) "bearing the Sumerian hymn composed for Gudea, prince of Lagash: 'On the day when in heaven and earth the fates had been decided...'."

  • Encoding materials:

  • Contact:

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: