The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Created: April 28, 2004.
News: Cover StoriesPrevious News ItemNext News Item

Delivering Classics Resources with TEI-XML, Open Source, and Creative Commons Licenses.

The Center for Hellenic Studies of Harvard University has adopted an innovative technological program for free online publication of books, articles, and databases designed to make resources in the classics more visible and accessible.

A second issue of the online Classics@: The Electronic Journal of the Center for Hellenic Studies of Harvard University features articles about "Ancient Mediterranean Cultural Informatics." It is published under the Creative Commons 'Attribution-NonCommercial-ShareAlike' license allowing others to copy, distribute, display, and perform the authored work, and to create derivative works in non-commercial settings.

The Harvard CHS publication process is based upon TEI-XML encoding and "uses open source tools to convert proprietary word-processing files to TEI-XML and to publish the result." Consistent with the intellectual mission of the CHS editorial group to expedite online publication and collaborative research, the team is developing a process and tools "that others can adopt or modify to produce online and print books rapidly, beautifully, and accurately." Erik Ray and Benn Salter have assisted the CHS technical team in the development of Perl and XSLT transformation tools to convert word-processor data into TEI-XML format; publication of the materials online involves the use of the open source Apache Cocoon web development framework.

Classics@ Issue Two was published directly from source files encoded in TEI-conformant XML, using publication mechanisms available in the CHS TextServer protocol.

CHS is also "committed to experimental uses of online publication to complement print publication as well as innovative arrangements with traditional academic publishers in the interest of generalizing its goals to the academic community and of making creative classical scholarship available to the widest possible audience."


Contents of Classics@ Volume 02 (April 2004)

Bibliographic metadata and excerpts are provided here for articles published in Issue Two of Classics@, edited by Christopher Blackwell and Ross Scaife. The Center for Hellenic Studies of Harvard University, Casey Dué and Mary Ebbott, executive editors. April 2004.

  • Greg Nagy, "Preface." Gregory Nagy is Director of the Center for Hellenic Studies. Greg Nagy (Francis Jones Professor of Classical Greek Literature and Professor of Comparative Literature, Harvard University) and James O'Donnell (Provost, Georgetown University) provide leadership for the Classics@ editorial team. "The CHS has undertaken a series of ambitious new initiatives, especially in the realm of reforming existing practices in academic publishing — electronic as well as in-print. The CHS is committed to be in the forefront of devising new procedures and protocols..."

  • Christopher Blackwell and Ross Scaife, "Introduction: CHS Summer Workshop on Technology." "The first ever CHS Summer Workshop on Technology took place at the Center in Washington DC from June 23 through June 29, 2003. This workshop was designed to bring together a group of scholars interested in the possibilities afforded by the electronic manipulation of texts, and particularly how current standards — XML, XSLT, and Unicode, to name a few — can help us create, analyse, connect, and share the materials with which we work... There were three primary goals for the workshop: (1) Teaching the basics of marking up texts using TEI-conformant XML to anyone who did not already know it; this will include helping them set up a working environment; (2) Learning from some early pioneers in classics and technology, who have valuable datasets that could become even more valuable if they could interact with other current projects; (3) Providing space for people at all stages of technological skill and experience to share ideas, make connections, and build esprit de corps..."

  • Deborah Anderson, "Preliminary Guidelines to Using Unicode for Greek." "This article offers a concise introduction to the Unicode standard and attendant technologies, aimed specifically at students and scholars of classical Greek. It describes the (intentional) limits of the Unicode standard, gives some guidelines for using Unicode characters, answers some frequently asked questions, and includes a bibliography of useful resources... Unicode is the international character encoding standard and is fully synchronized with ISO 10646, its parallel International Standard maintained by the International Organization for Standardization. Character encoding refers to the assignment of a number to a letter or other symbol found in a text... In the scheme of multi-layered text representation, character encoding is on the bottom, above this is markup (HTML, XML, or TEI), which can convey the hierarchical structure of a document and the content it consists of, and metadata is on the top level. Metadata is structured data about data structure... The answer to the problem of reliably transmitting Greek text data is to use software and fonts that are based on the international character standard Unicode. As long as Hellenists are using products that are Unicode-compliant — both the sender of a document and the recipient — a Greek a should appear as an a in any electronic text document. As a result, Greek texts will be widely accessible to others on any platform and in any country and will help assure longevity to the data through time. Unicode is also now the default standard for XML..." [see Unicode and XML]

  • Michael Arnush, "The Epigraphic Database for Athenian Democracy (EDAD)." "The on-line Epigraphic Database for Athenian Democracy (EDAD) plans to make accessible to a broad audience the inscriptional evidence for the origins and development of democracy in ancient Athens. EDAD's underlying principle is to provide transparent access to texts, translations and commentaries in support of a larger collaborative effort for which transparency is an essential component: Dēmos: Classical Athenian Democracy . The nearly 10,000 inscriptions from late archaic and classical Athens (508-322 BCE) detail the day-to-day operations of the world's first democracy, yet they are accessible either in arcane publications with Greek text and commentary, or are translated in sourcebooks with little apparatus and no Greek.2 EDAD will include the essential features of both types of publication and thus provide the epigraphic sources for the scholar and student in a transparent and easily accessible manner. In essence, the database will present Greek texts and translations akin to works in the Perseus Project with all of the lexical and morphological tools, though with some dramatic differences: EDAD will include comprehensive bibliographies, and linguistic and historical commentaries, and will allow the user to tailor the database to specific needs and interests..."

  • Christopher W. Blackwell, "Dēmos: Challenges and Lessons." "Dēmos: Classical Athenian Democracy is a medium-sized digital library of texts aimed at inviting non-specialist readers to engage in the a critical reading of primary and secondary sources for this ancient historical topic. This article describes benefits, and potential pitfalls, of building an XML-based collection of humanist texts... Dēmos is an online collection of articles about Athenian democracy in the 5th and 4th centuries BCE. As of this writing, Demos consists of 113 articles, totalling approximately 400,000 words of content, by at least 13 different authors. The 26 major articles are available as PDF files, and these alone toal 785 pages of content. So the site is a medium-sized digital library, if we define small as a few documents and large as many thousands of documents... The content of Dēmos consists of TEI-conformant XML documents. These are dissected, merged, and transformed in various ways by a set of custom XSLT stylesheets, with Tomcat/Cocoon doing the work, before being delivered as HTML to a reader's browser, which formats them according to custom CSS stylesheets for attractive and intuitive reading and interaction... This project will stand on the TextServices protocol [...] so the Dēmos Ancient Texts collection will be fully integrated into a distributed digital library of humanist texts, with easy and fully automated sharing of data and metadata..."

  • Sandra Boero-Imwinkelried, "Vicus Unquentarius: Perfume, Epigraphy, and XML." "The Center for Hellenic Studies, The Stoa, and EpiDoc promote the use of XML for the encoding of classical texts and scholarly publications in general, following the guidelines of the Text Encoding Initiative. The TEI is an 'international and interdisciplinary standard that helps publishers and individual scholars represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.' The EpiDoc collaborative represents an effort to focus the TEI on the particular needs of epigraphists, as it 'sets up guidelines for the publication of Roman and Greek inscriptions'... Vicus Unguentarius is a project dedicated to the study of the epigraphic record pertaining to the scent industry in the ancient Roman world — presently limited to the Roman corpus. I have been developing Vicus with the sponsorship of the Stoa Consortium and the Center for Hellenic Studies. The texts in Vicus already utilize TEI-conformant XML markup; in time they should use the Epidoc conventions as well..."

  • Hugh A. Cayless, "Directory Services for Classical Informatics." "This article describes the need and outlines a proposal for an infrastructure that will manage registries of uniquely identified entities, allowing scholarly projects to preserve, share, and link information with as little human intervention as possible... The directory service will be accessible via a web-based API, in which commands and requests for information are issued in the form of HTTP GETs. Responses will be in the form of XML documents containing the requested information or the status of the command's execution. The proposed methods for the DirectoryServer protocol may be grouped into three areas, discovery, resolution, and administration..."

  • Susan Guettel Cole, "From GML to XML." Dionysos was to be found everywhere in the lands around the ancient Mediterranean, omnipresent in literature, ubiquitous in the visible culture, and a partner in such hazardous experiences as drinking wine, viewing mimetic representations on the stage, or facing death. A god who marked the dangerous boundary between self and other, his gifts held the promise of pleasures, but crossing that boundary without his protection could carry risks. Dionysos was perhaps the most popular divinity in all the lands where Greek influence was felt... A project to publish a collection of inscriptions about the cult of Dionysos straddles two different methods of recording and sorting evidence. Data originally collected on index cards in the early 1980's and encoded with structured markup in GML on an IBM mainframe in the twilight period of mainframe technology has hibernated until XML and Unicode have made possible a stable environment for preservation, retrieval , and dissemination of the material. The editors of The Stoa Consortium have overseen conversion of over 900 documents in this project to XML, and those files may now be viewed on a website for browsing, searching, and downloading texts, translations, and commentaries on the subject of Dionysos... New texts can easily be added as they are finished, and the entire collection can also eventually be formatted for a conventional publication in book form..."

  • Casey Dué and Mary Ebbott, "As Many Homers As You Please: An On-line Multitext of Homer." A team of scholars, working under the auspices of Harvard's Center for Hellenic Studies in Washington, D.C., and in cooperation with the Stoa consortium, is currently developing the tools and resources that will eventually comprise the CHS Multitext of Homer. Although electronic texts and translations of the Iliad and Odyssey are currently available in various places on the web, the CHS multitext will be much more than that. The multitext will be able to display known variants from papyri, scholia, medieval manuscripts, and ancient quotations, presented in a diachronic framework... The multitext will also be linked to supplementary materials, including translations, scholia, a modern commentary, and information about Alexandrian and Pergamene libraries, scholars, and scholarship. The project will eventually include multitexts of both the Iliad and Odyssey and Greek texts with English translations of the lives of Homer, Proclus's summaries of the Epic Cycle, the fragments, and the Homeric Hymns. A major component of the project is to offer unprecedented access to the scholia contained in the tenth century Venetus A manuscript of the Iliad... This article introduces the multitext editions of the Iliad and Odyssey that are currently being produced by a team scholars in association with the Center for Hellenic Studies in Washington, D.C. and the Stoa Consortium.

  • Rebecca Frost Davis, "Collaborative Classics: Technology and the Small Liberal Arts College." Sunoikisis, the collaborative program in classics of the Associated Colleges of the South, has created a digital infrastructure that enables a virtual classics community. Members of this community use technology for communications, information management, inter-campus courses, and research. Sunoikisis helps foster ancient Mediterranean cultural informatics through the creation and use of electronic materials for these activities. By educating students and faculty in electronic resources and making them comfortable in the web environment, Sunoikisis contributes to the growing digital culture in the world of classical studies... The Sunoikisis digital infrastructure also supports scholarly research that integrates students and faculty; the ACS Archaeology program, directed by Mark Garrison, Professor of Art History at Trinity University, consists of a spring one-hour ICC and a summer field school in HacImusalar, Turkey... The primary architect of the system is Neel Smith, Professor of Classics at the College of the Holy Cross, who serves as IT and systems director for the Bilkent University, Hacimusalar excavation in southwest Turkey. The system is based on the Cocoon XML publishing framework, which integrates data from project databases, XML marked-up excavation notebooks and the GIS server. Unlike other archaeological projects, all data collected are entered into the system (mostly on site) and are integrated with all other data. These materials are then available for use in the course or for later research and publication. The entire system will be made freely available for use by other archaeological projects. View the system at Project members are also engaged in preparing the preliminary publication, which will be marked up in TEI-conformant XML..."

  • Michael Jones, "Making Electronic Publication Easier, Faster, and More Powerful With Hydra, a Drag-and-Drop TEI Publishing Environment." "Hydra is an experimental drag-and-drop electronic publishing environment for TEI-conformant XML texts. This article introduces the capabilities and limitations of the system... Hydra was created so that authors of TEI-conformant XML can simply drop their file into a folder and then view the text in HTML or PDF format immediately. Thus the goal is to encourage the use of TEI-conformant XML as a standard by making it far easier to transform that XML into more readable formats... Hydra is a custom distribution of Cocoon, an XML publishing framework. Hydra builds on the flexible publishing environment of Cocoon and its separation of concerns between content, logic, and style. Cocoon incorporates these concerns using components and pipelines where each component in the pipeline executes a particular function. In a typical example, a generator component reads and parses an XML file, a transformer component converts the XML markup into a different XML markup using XSLT, and a serializer component produces the resulting output. Within the base distribution, Cocoon includes a number of generators, transformers, and serializers that anyone can use in their application. Some of these included components are generators which can read from native filesystems and XML as well as serializers which can output PDF, RTF, SVG and other formats. In addition to using these components, it is also possible to build custom components. Hydra uses one such custom-built transformer, Transcoder, written by Hugh Cayless. The Transcoder transforms TLG Betacode-encoded Greek into a number of different Greek encodings, such as SPIonic, Sgreek, and Unicode..."

  • Martin Mueller, "Of Digital Serendipity and the Homeric Scholia." "... I sent a message to a new Homerica listserv announcing the New Scholiasts and very quickly got an interested response from René Nuenlist at Brown University, who is writing a book on the literary criticism contained in ancient scholia and has been talking with John Lundon (University of Cologne), Jessica Wissmann (Center for Hellenic Studies), and Eleanor Dickey (Columbia University) about translating Homeric and other scholia. We had some fruitful email discussions, and as a result of them the four have joined the New Scholiasts as managing editors and will in fact be largely responsible for the future direction of the project. This is a nice example of how the flexibility and speed of digital media help bring about a situation in which scholarly data thought too arcane or difficult to access are moved within easier ken of students and scholars struggling with this or that problem in Homer. With some luck and energy — and keeping in mind the success of Suda On Line — I can envisage a situation in which three years from now a non-trivial percentage of the more interesting Homeric scholia are accessible to the reader in ways in which they have not been since Byzantine days.

  • Bruce Robertson, "Improving Ancient History Online with Heml." "The Historical Event Markup and Linking Project (Heml) provides a markup language used to coordinate historical resources across the web, and computer programs that link such resources into historical timelines, maps and animations. Using sample events from the life of Alexander the Great, this paper demonstrates how such a system might aid the study of antiquity on the web... This paper illustrates Heml's work through sample documents outlining some events in the career of Alexander the Great. It should be made clear at this point that the Heml project does not aim to produce a 'database of history' or digital library. Its goal is to provide a common markup language that could facilitate and even inter-link any number of such databases or libraries.The Historical Event Markup and Linking Project is based upon standard XML technologies [...] and like the other projects mentioned here, its work is published openly and free of cost. Heml has three sub-projects: it defines how historical events, persons and places can be encoded in XML; it provides means of joining together disparate documents; and, most interestingly, it explores how encoded material can be visualized through computer graphics..." See also the Heml project description.

  • Neel Smith, "TextServer: Toward a Protocol for Describing Libraries." Many of the projects presented at the Technology and Classics meeting in June of 2003 illustrated "benefits of semantic markup for a variety of scholarly projects." In this paper the author considers "a question that follows directly from these observations: how can projects with similar material distributed across the internet interoperate? [...] Our desire to take advantage of information in TEI-conformant XML from scholars at a number of institutions is exactly the kind of problem that many businesses and government organizations, as well as academic institutions, are energetically working on. Conceptually, it corresponds closely to what Tim Berners-Lee, the creator of the World Wide Web, and the World Wide Web Consortium call 'the semantic Web.' [I describe] a set of conventions for providing these services. I will refer to the conventions themselves as 'the TextServer conventions' and will call a program implementing these conventions a TextServer. My goal in defining formal conventions for a TextServer is to meet the absolute minimum requirements of citing, retrieving and replicating an electronic publication. These requirements are not unique: digital libraries must generically provide some way to identify publications, to discover their citation schemes, and to retrieve pieces of on-line publications using those citation schemes... [Some] current efforts to define protocols for digital libraries fall short of our needs because their notion of citation focuses on what I would term documents, rather than texts in the sense that classicists often use that word. Focusing on documents is a legitimate design choice, but we as classicists need to be aware of its implications. When we refer to classical texts, we most often use a canonical reference system describing a logical hierarchical organization of the text independent of any specific physical version. This remarkable practice is so familiar that we often fail to recognize its consequences. Notably, it allows us to discuss a notional text at many different levels... In our distributed electronic library, we want the fundamental scholarly activity of citing a work to take account of this hierarchical organization of our notional text. A citation should be able to point either to specific versions (e.g., to contrast Nobbe and Müller's readings for this passage), or to refer only to a notional text of Ptolemy that one reader might prefer to lookup in English translation, another in German and a third in a Greek edition. We expect a citation form like 'Ptol. Geo. 1.2' to be valid at any of these levels..."

  • Lenny Muellner, "CHS Publishing Program and Goals." "The Center for Hellenic Studies (CHS) has embarked on an aggressive publication program with explicit technical goals. As part of the intellectual mission of our editorial group, we are committed to unfettered, free online publication of all books that we publish and to print publication of some books. Our print/online publication process, which is in development, is based on TEI-XML source and uses open source tools to convert proprietary word-processing files to TEI-XML and publish the result... Once the content of a book has been finalized by author and editors is that online production begins. Currently, only the Windows XP version of MS Word 2003 can save tagged Word files in a format that Microsoft deems to be XML. It is a flat XML with a host of presentational tags and not much in the way of structure, but it can be the basis for conversion into TEI-XML since it preserves the CHS style tags embedded in the document by the author. CHS commissioned Erik Ray, the lead author of the O'Reilly books Learning XML and Perl and XML, and Benn Salter, a freelance Perl programmer, to develop a conversion tool that uses Perl and XSLT (also two Perl modules, XML::LibXML and XML::DOMHandler, and two open source C libraries, lib2xml and libxslt) to convert Word's version of XML into TEI-XML. Since word processors like MS Word are not structured editors of content, there is no guarantee that the document produced by this converter will parse, so there is a need for a manual editing and parsing pass to correct any errors. But our experience is that the documents we are converting are simple enough in structure to make this process largely automatic. The final step is to publish these documents with a style sheet on the CHS website using Cocoon. For a book of normal length, this process should take a part-time worker at most two weeks..."

About Classics@ Issue Two: "Ancient Mediterranean Cultural Informatics"

The April 2004 issue of Classics@: The Electronic Journal of the Center for Hellenic Studies of Harvard University is the first edition of an ongoing project of publication aimed at documenting the emerging [Cultural Informatics] sub-discipline within the classics field — "the scholarship of creating, analyzing, and disseminating humanist learning electronically. The need for such a project emerged from a week long workshop hosted by the CHS in June of 2003...

The projects represented in that gathering of scholars included work focused on new electronic editions of primary texts, such as the Homer Multitext, edited by Casey Dué and Mary Ebbott, the collections of Dionysiac inscriptions edited by Susan Cole, or Vicus Unguentarius, a collection of inscriptions related to the ancient scent industry, edited by Sandra Boero Imwinkelried. Other work aimed at disseminating scholarly argument and analysis electronically, such as Josh Sosin's work as the new editor of Greek, Roman, and Byzantine Studies toward electronic publication of that journal, or Christopher Blackwell's Dēmos: Classical Athenian Democracy, an electronic resource aimed at inviting a wide audience to engage ancient history. Still other projects focused on fostering collaborative research, such as New Scholiasts, Martin Mueller's infrastructure for shared translation and commentary of the Erbse scholia, or Bruce Robertson's Historical Event Markup and Linking. And looking toward the future, the CHS Summer Workshop in Technology launched several new initiatives aimed at fostering further collaboration, both among scholars and automatically, among sites that host electronic texts or indexed data..." [adapted from the cover page]

"Classics@, edited by a team working for the Center for Hellenic Studies and headed by Gregory Nagy and James O'Donnell, is designed to bring contemporary classical scholarship to a wide audience on the World Wide Web."

About Creative Commons Licenses

Classics@ Issue Two was published under the Creative Commons Attribution-NonCommercial-ShareAlike 1.0 license, allowing liberal terms for non-commercial use of the content: permission is granted to copy, distribute, display, perform, and make derivative works from the original. Use of Creative Commons machine-readable licenses in digital media represents something opposite to traditional "digital rights management (DRM)," which seeks to restrict/deny access and (ostensibly) to prevent piracy and theft. With Creative Commons licenses there is no need to request permission within the structure of (sic!) "trusted computing" architectures: permission has already been granted.

The CC document describing Creative Commons Metadata illustrates the use of embedded markup in HTML, XML (RSS) and several other common media: Syndic8 tracks RSS feeds that are available under a Creative Commons license; see the CC schema which lets you describe copyright licenses in RDF. For non-web content such as files on peer-to-peer networks, the CC solution is to embed a link to a license info page that includes the license metadata; this allows associating licenses with MP3 and Ogg files. One may use Adobe applications to embed Creative Commons metadata in PDF and other XMP-supported file types. A CreativeCommons SMIL Module has been created for using CC licenses with the Synchronized Multimedia Integration Language. Members of the CC team are "working on procedures to tag additional video, music, image and text file formats..."

According to an article from Andy Raskin:

... The [Creative Commons] licenses now come in machine-readable form, which means that smart CD players can display a song's license as it plays. There is also a plug-in for Adobe's Photoshop that recognizes licenses embedded in image files. The open-source Mozilla project plans to put a Creative Commons search tool alongside one for Google in its Firefox 1.0 browser, due out this summer, making it easy to search the Web for, say, photos of the Empire State Building that are cleared for noncommercial use... But what's really interesting is that as more and more artists use Creative Commons to tell the world that it's OK to copy, distribute, and build on their work, the first glimpses emerge of an economy based on the free exchange of digital content. The "sharing economy" is built on a supply-and-demand equation wholly alien to traditional media companies — the record labels, Hollywood studios, and publishing houses that support strict copyright enforcement. See: "Giving It Away (for Fun and Profit)," by Andy Raskin, in Business 2.0, May 2004.

The Creative Commons project uses "private rights to create public goods: creative works set free for certain uses. Like the free software and open-source movements, its ends are cooperative and community-minded, but its means are voluntary and libertarian. [The design team] works to offer creators a best-of-both-worlds way to protect their works while encouraging certain uses of them — to declare 'some rights reserved.' Thus, a single goal unites Creative Commons' current and future projects: to build a layer of reasonable, flexible copyright in the face of increasingly restrictive default rules..."

"Offering your work under a Creative Commons license does not mean giving up your copyright. It means offering some of your rights to any taker, and only on certain conditions. What conditions? The site will let you mix and match such conditions from the list of options below. There are a total of eleven Creative Commons licenses to choose from..."


About the Text Encoding Initiative (TEI)

Most of the articles published in Issue Two of Classics@ reference the TEI XML encoding guidelines. "Initially launched in 1987, the TEI is an international and interdisciplinary standard that helps libraries, museums, publishers, and individual scholars represent all kinds of literary and linguistic texts for online research and teaching, using an encoding scheme that is maximally expressive and minimally obsolescent." Over 100 archive and digital library projects use the Guidelines for Electronic Text Encoding and Interchange. The current version of the Guidelines is TEI P4 (The XML Version of the TEI Guidelines). The chief objective of this revision was to implement proper XML support in the Guidelines, while ensuring that documents produced to earlier TEI specifications remained usable with the new version. P5, "a substantial revision of the TEI Guidelines, is scheduled to be released by the end of 2004. Based upon RelaxNG, an XML Schema language, P5 offers new capabilities and advantages to digital project planners, digital libraries, and scholars & encoders in general. Digital project planners will find support for manuscript description, multimedia and graphics, standoff annotation, and XML namespaces. Digital Libraries will be able to use XML namespaces and the TEI namespace to use non-TEI tagsets in TEI documents, such as MARC records, or ecological metadata [and] use TEI headers and other structures inside other XML namespaces, such as METS and MODS..."


Principal References

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: