The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Last modified: March 16, 2000
XCES: Corpus Encoding Standard for XML

[February 11, 2000] A posting from Patrice Bonhomme (LORIA/CNRS) announces the Beta release of XCES, "which instantiates the Corpus Encoding Standard (CES) DTDs for linguistic corpora developed by the Expert Advisory Group for Language Engineering Standards (EAGLES). XCES was developed by the Department of Computer Science, Vassar College, and Equipe Langue et Dialogue, LORIA/CNRS." The Corpus Encoding Standard (CES) is "a part of the EAGLES Guidelines developed by the Expert Advisory Group on Language Engineering Standards (EAGLES). The CES is designed to be optimally suited for use in language engineering research and applications, in order to serve as a widely accepted set of encoding standards for corpus-based work in natural language processing applications. The CES is an application of SGML (ISO 8879:1986, Information Processing--Text and Office Systems--Standard Generalized Markup Language) compliant with the specifications of the TEI Guidelines for Electronic Text Encoding and Interchange of the Text Encoding Initiative. The CES specifies a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive representation (marking of structural and typographic information) as well as general architecture (so as to be maximally suited for use in a text database). It also provides encoding specifications for linguistic annotation, together with a data architecture for linguistic corpora."

The XCES XML DTDs provide for: (1) language identification (an attribute xml:lang (CDATA) has been added to the CES global attributes); XLink (support for the XLink specification by including the sub-dtd xlink.ent -- for simple, extended, locators and arc elements -- is under development; (3) XPointer/Xpath (currently implementing the use of XPointers and XPaths for locator element types); (4) XSL Stylesheets. The XCES XML DTDs are available for download, as are the XSL stylesheets. The development team is in the process of developing stylesheets for cesAna [encoding conventions for annnotated data] and cesAlign [encoding conventions for aligned data] documents. "Note that XCES is under development and subject to change. We are currently developing documentation to support XCES. However, the existing CES documentation supporting general encoding practices for linguistic corpora and tag usage is largely relevant to the XCES instantiation, and should be consulted in the meantime." Questions and comments concerning the XML DTDs may be sent to Nancy Ide or Patrice Bonhomme.

Project Address:
Nancy Ide
Professor and Chair
Department of Computer Science, Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 914 437-5988 Fax: +1 914 437-7498
ide@cs.vassar.edu

References:


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI: http://xml.coverpages.org/xces.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org