The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: November 16, 2000
Computing Environment for Linguistic, Literary, and Anthropological Research (CELLAR)

[November 16, 2000] "CELLAR is an object-oriented database system that is being developed by the Academic Computing Department of SIL to meet the data management needs of its field workers. Two of CELLAR's special features are the ability to cope simultaneously with data in many languages, and design which separates the conceptual model of a data set from multiple (interchangeable) views for display and encoding formats for import and export. While important aspects of the design were motivated by the needs of linguistic research, the system is fully programmable and can be used to develop text-related (as opposed to number crunching) applications for any discipline. CELLAR Verson 1.0 was publicly released in September 1996 as part of a software application ('LinguaLinks') suitable for work in field linguistics. The LinguaLinks Workshops use CELLAR to implement applications for phonological analysis, interlinear text analysis, lexical database management, and other tasks." The CELLAR system is described more fully in several (online) project publications.


  • CELLAR web site

  • See abstracts for several CELLAR-related papers in the main bibliography, sub Simons.

  • "CELLAR: A Data Modeling System for Linguistic Annotation." By Gary F. Simons (SIL International). Presented at the LREC Workshop "Data Architectures and Software Support for Large Corpora," 30-May-2000, in conjunction with The Second International Conference on Language Resources and Evaluation (LREC-2000, Athens, Greece). "The paper illustrates how an annotation schema is expressed as an XML document that defines classes of objects, their properties, and the relationships between objects. The schema is then implemented via automatic conversion to a relational database schema and an XML DTD for data import and export. CELLAR ('Computing Environment for Linguistic, Literary, and Anthropological Research') is a data modeling system that was built specifically for the purpose of linguistic annotation. It was designed to model the following five fundamental aspects of the nature of linguistic data [...] CELLAR is not a particular annotation schema, but is a system for expressing and building annotation schemas. A particular annotation schema is called a conceptual model and is expressed as an XML document which defines classes of objects, their properties, and constraints on the values of properties and the relationships between objects. A simplified version of the DTD for expressing conceptual models is given [on page 3]. Modeling begins by identifying the classes of things in the world being modeled (i.e., the objects of object-oriented modeling or the entities of entity-relationship modeling). A class definition consists of documentation and a set of property definitions (which implement the third requirement above). Each class has a base class from which it also inherits properties; the ultimate base class is CmObject, for 'conceptually modeled object.' There are three kinds of properties: (1) Owning properties implement the part-whole relationships entailed by the second requirement that data are hierarchically structured. The cardinality attribute can specify that the owned objects form a sequence, thus supporting the first requirement. (2) Relationship properties implement the fourth requirement, that the model must support associative links between related data objects. The signature attribute constrains what classes of objects can be the targets of a link. (3) Basic properties store primitive data values like numbers, strings, Booleans, dates, and binary images (such as for graphics or sound). The fifth requirement that data are multilingual is supported by the fact that the primitive type String allows spans of characters to be identified as to language, and MultiString and MultiUnicode support alternate renditions of the same string in multiple languages. As a conceptual model is developed in XML, descriptions of the classes and properties are included right inside the definitions. As a result, an XSL stylesheet is able to render the conceptual model source code as hyperlinked documentation in a browser..." [cache]

  • "Mapping from objects to markup: a springboard for multiple-strategy electronic publishing." By Gary Simons. Pages 151 - 153 in ACH-ALLC '97. The 1997 Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing. Conference Abstracts. ACH-ALLC '97. Queen's University at Kingston, Ontario, Canada. June 3 - 7, 1997. Compiled by Greg Lessard and Michael Levison. Ontario, Canada: Queen's University, 1997. ISBN: 0-88911-760-8. Author's affiliation: Summer Institute of Linguistics. [Extract:] "This paper reports on the experience of the Summer Institute of Linguistics in developing electronic publishing solutions for its LinguaLinks product (SIL 1996). LinguaLinks is an electronic performance support system designed to assist field workers with a wide range of tasks related to language learning, language analysis, and language development. The paper first introduces the LinguaLinks model of performance support and CELLAR -- the object-oriented database system that is used to implement it. Our approach to electronic publishing is to first build the information as a structure of objects in the database, and then to use multiple CELLAR stylesheets to map the information onto multiple markup schemes. The object database thus serves as a springboard that allows us to vault the information into any number of formats for publishing. The paper illustrates this approach to electronic publishing by focusing on one application area that LinguaLinks supports, namely, lexical database development. It first shows how the tutorial and reference documents that give help on how to build a dictionary are mapped onto different markup schemes for publication as a Folio Views infobase, a Windows help system, and an HTML Web document. It then shows how the dictionaries that are built by using LinguaLinks are mapped onto HTML markup to provide a display format on the Web and onto TEI markup to provide a richer format for information interchange and archiving."

  • "Importing SGML data into CELLAR by means of architectural forms." By Gary Simons. 15-December-1997. This working paper documents a process for importing SGML data into the CELLAR database. The process, which requires no change to the SGML data and no special-purpose programming on the CELLAR side, is based on a relatively new SGML feature named architectural forms. The user writes a meta-DTD that maps the elements in the SGML data onto architectural forms that express the corresponding objects and attributes in CELLAR. Then an SGML parser uses this to create an 'architectural document' that an existing CELLAR parser reads to build the corresponding structure of objects in the CELLAR database." This electronic working paper gives the full details of work that has been presented in two conference papers.

  • "Multilingual data processing in the CELLAR environment." By Gary F. Simons and John V. Thomson. A paper presented at: Linguistic Databases, 23-24 March 1995, University of Groningen, Centre for Language and Cognition and Centre for Behavioural and Cognitive Neurosciences. "This paper describes a database system developed by the Summer Institute of Linguistics to be truly multilingual. It is named CELLAR -- Computing Environment for Linguistics, Literary, and Anthropological Research. After elaborating some of the key problems of multilingual computing (section 1), the paper gives a general introduction to the CELLAR system (section 2). CELLAR's approach to multilingualism is then described in terms of six facets of multilingual computing (section 3). The remaining sections of the paper describe details of how CELLAR supports multilingual data processing by presenting the conceptual models for the on-line definitions of multilingual resources." See also the paper in HTML format.

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: