Cover Pages: XML System for Textual and Archaeological Research (XSTAR)

[November 21, 2001] The XML System for Textual and Archaeological Research (XSTAR) was announced in November 2001. The XSTAR project is based at the Oriental Institute of the University of Chicago and is led by two University of Chicago faculty members: an archaeologist (David Schloen) and a linguist (Gene Gragg).

The goal of the XSTAR project "is to create a sophisticated Internet-based research environment for specialists in textual and archaeological studies. In particular, XSTAR is intended for archaeologists, philologists, historians, and historical geographers who work with ancient artifacts, documents, and geographical or environmental data. It will not only provide access to detailed, searchable data in each of these areas individually, but will also integrate these diverse lines of evidence as an aid to interdisciplinary research... consists of both a database structure and related interface software that will make it possible to view and query archaeological, textual, and linguistic information in an integrated fashion via the Internet. The XSTAR database structure is expressed in terms of hierarchies of interlinked data elements using the Extensible Markup Language (XML)... XSTAR's XML data format is based on and incorporates ArchaeoML (Archaeological Markup Language), an XML tagging scheme previously developed at the University of Chicago's Oriental Institute."

From the web site description:

In addition to archaeological and geographical descriptions, XSTAR describes the epigraphic and linguistic features of ancient texts (represented as text elements), whose physical contexts and description as artifacts, including photographs and drawings, are treated as archaeological data. The link between archaeological data and textual data is accomplished by linking the relevant archaeological unit element to the appropriate text element. Recursive tree structures (in which elements are nested within the same type of element) are used not just for the spatial hierarchy of archaeological units but also for text descriptions because: (1) tree structures are appropriate for both epigraphic and linguistic analyses of larger text components into subcomponents; (2) the number of different element types is kept to a minimum and recursive programming techniques can be used, simplifying software development; and (3) tree structures are easily implemented in XML.

The XSTAR software intended to work with the XSTAR database structure is currently under development... This software will interact with XSTAR data by means of the Tamino "native-XML" database management system, running on a University of Chicago server, which will deliver information as XML text over the World Wide Web. It will be possible to search large quantities of structured information using a combination of geographical, archaeological, and philological criteria.

The XML element hierarchies used in XSTAR represent the following types of information and the complex interrelationships among them:

Archaeological and geographical descriptions, consisting of observations about geographical regions (topography, climate, hydrology, vegetation), ancient landscapes (roads, canals, fields), settlement sites (architecture, stratigraphy, botanical and faunal remains), and artifacts (including the physical properties and contexts of inscribed artifacts). Archaeological and geographical descriptions include not just alphanumeric data but also visual resources such as photographs, video clips, drawings, maps, and 3-D models.

Text and script descriptions, consisting of the epigraphic and linguistic characteristics of ancient texts and scripts, including sign-by-sign transliterations, normalized transcriptions, grammatical analyses, and modern-language translations.

Language descriptions, presenting the phonology, morphology, syntax, and lexicon of each ancient language in the database.

Secondary literature and bibliographic references, consisting of technical reports and interpretive discussions of primary data, organized by author and by modern conceptual categories. The thematic organization of secondary material provides a framework within which archaeological, geographical, textual, and linguistic descriptions can be located.

Pilot Datasets: Four geographically and chronologically diverse sets of archaeological and textual information have been identified as pilot datasets to be used in testing and documenting XSTAR before making it more widely available. These datasets are derived from existing archaeological and philological projects at the University of Chicago. For each pilot dataset, both the data acquisition (digitization of material and XML markup) and the Java interface design will be conducted under the supervision of faculty members who have the responsibility and relevant expertise to publish the information. The Java software that is developed for creating, viewing, and searching these diverse datasets will have generic applicability to all kinds of ancient texts and sites. The pilot datasets and their supervisors are:

The Ashkelon Publication Project (David Schloen): Detailed information collected during the recent large-scale excavations at ancient Ashkelon, a major Levantine seaport on the Mediterranean coast (ca. 1900 b.c. to a.d. 1200).

Hattusa and the Chicago Hittite Dictionary (CHD) (Theo van den Hout, Harry Hoffner, and Gene Gragg): An electronic edition of the standard multivolume lexicon of the Hittite language, linked to Hittite cuneiform texts and their archaeological contexts at Hattusa, the capital of the Hittite empire in north-central Anatolia (ca. 1600 to 1200 b.c.).

Middle Egyptian Text Editions for Online Research (METEOR) (Janet Johnson and Karen Landahl): Critical editions and translations of important Middle Egyptian hieroglyphic texts, linked to relevant geographical, archaeological, and historical information (ca. 2100 to 1350 b.c.).

Persepolis and the Achaemenid Royal Inscriptions Project (ARI) (Matthew Stolper, Charles Jones, and Gene Gragg): Multilingual texts (written in Old Persian, Elamite, Akkadian, and Aramaic) and their archaeological contexts at Persepolis, a capital of the vast Achaemenid Persian empire based in southern Iran (ca. 550 to 330 b.c.).

References:

XSTAR web site
XSTAR: XML System for Textual and Archaeological Research. Technical Report by David Schloen. University of Chicago, Oriental Institute. November 2001. 23 pages. [cache]
[June 12, 2001] "Archaeological Data Models and Web Publication Using XML." By J. David Schloen (The Oriental Institute of the University of Chicago). In Computers and the Humanities (CHUM) Volume 35, Number 2 (May, 2001), pages 123-152. "An appropriate standardized data model is necessary to facilitate electronic publication and analysis of archaeological data on the World Wide Web. A hierarchical 'item-based' model is proposed which can be readily implemented as an Extensible Markup Language (XML) tagging scheme that can represent any kind of archaeological data and deliver it in a cross-platform, standardized fashion to any Web browser. This tagging scheme and the data model it implements permit seamless integration and joint querying of archaeological datasets derived from many different sources... see http://www-oi.uchicago.edu/ for the latest version of the XML Document Type Definition in which ArchaeoML is defined..." See also from "Electronic Publication of Ancient Near Eastern Texts": "David Schloen, an archaeologist in the University of Chicago's Oriental Institute, gave the final formal presentation on Saturday afternoon, entitled 'Texts and Context: Using XML to Integrate and Retrieve Archaeological Data on the Web.' Schloen noted that XML is as suitable for representing archaeological databases as it is for representing ancient texts. But whether the information is expressed in XML or in some other data format (e.g., a relational database), archaeologists need an appropriate data model that captures in a rigorous and consistent fashion the idiosyncrasies of units of archaeological observation, as well as the spatial and temporal interrelationships among them. Schloen proposes a hierarchical, 'item-based' data model, rather than the 'class-based' (tabular) data model which currently prevails. The item-based data model has the advantage of being straightforwardly represented in XML as a nested hierarchy of tagged elements with their attributes. Moreover, texts can be treated like any other type of artifact, as items in a spatial hierarchy with their own properties. Schloen concluded by presenting an XML tagging scheme dubbed ArchaeoML ('Archaeological Markup Language') which can represent any kind of archaeological data on any spatial scale, including the vector map shapes and raster images which belong to individual archaeological items..."
Contact: David Schloen, Assistant Professor, Oriental Institute, University of Chicago. 1155 East 58th Street, Chicago, IL 60637. Tel: +1 (773) 702-1382.
See also: "Encoding and Markup for Texts of the Ancient Near East" - Main reference page.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY