[February 11, 2000] A posting from Patrice Bonhomme (LORIA/CNRS) announces the Beta release of XCES, "which instantiates the Corpus Encoding Standard (CES) DTDs for linguistic corpora developed by the Expert Advisory Group for Language Engineering Standards (EAGLES). XCES was developed by the Department of Computer Science, Vassar College, and Equipe Langue et Dialogue, LORIA/CNRS." The Corpus Encoding Standard (CES) is "a part of the EAGLES Guidelines developed by the Expert Advisory Group on Language Engineering Standards (EAGLES). The CES is designed to be optimally suited for use in language engineering research and applications, in order to serve as a widely accepted set of encoding standards for corpus-based work in natural language processing applications. The CES is an application of SGML (ISO 8879:1986, Information Processing--Text and Office Systems--Standard Generalized Markup Language) compliant with the specifications of the TEI Guidelines for Electronic Text Encoding and Interchange of the Text Encoding Initiative. The CES specifies a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive representation (marking of structural and typographic information) as well as general architecture (so as to be maximally suited for use in a text database). It also provides encoding specifications for linguistic annotation, together with a data architecture for linguistic corpora."
The XCES XML DTDs provide for: (1) language identification (an attribute xml:lang (CDATA) has been added to the CES global attributes); XLink (support for the XLink specification by including the sub-dtd xlink.ent -- for simple, extended, locators and arc elements -- is under development; (3) XPointer/Xpath (currently implementing the use of XPointers and XPaths for locator element types); (4) XSL Stylesheets. The XCES XML DTDs are available for download, as are the XSL stylesheets. The development team is in the process of developing stylesheets for cesAna [encoding conventions for annnotated data] and cesAlign [encoding conventions for aligned data] documents. "Note that XCES is under development and subject to change. We are currently developing documentation to support XCES. However, the existing CES documentation supporting general encoding practices for linguistic corpora and tag usage is largely relevant to the XCES instantiation, and should be consulted in the meantime." Questions and comments concerning the XML DTDs may be sent to Nancy Ide or Patrice Bonhomme.
Project Address:
Nancy Ide
Professor and Chair
Department of Computer Science, Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 914 437-5988 Fax: +1 914 437-7498
ide@cs.vassar.edu
References:
[February 11, 2000] Announcment
Contacts: Nancy Ide or Patrice Bonhomme.