The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Created: February 12, 2001.
News: Cover Stories

XML Markup Technologies Featured at Corpus Linguistics 2001.

The Corpus Linguistics 2001 Conference at Lancaster University [30 March - 2 April 2001] will feature a special workshop on "XML Markup Technologies for Working with Linguistic Data," organized by Jean Carletta (University of Edinburgh) and Henry Thompson (University of Edinburgh and W3C). The workshop will highlight W3C XML-related standards relevant to corpus linguistics research as well as the "inventory of tools and technologies for the markup and analysis of language data expressed in XML developed by the Language Technology Group at the University of Edinburgh's Division of Informatics, with support from EPSRC, ESRC, and the EU." The CL2001 conference itself offers a special session on 'Markup And Tools', with papers on feature structures, text alignment (synchronization), and XML-based systems for corpora development.

From the workshop description: "The call for proposals encourages better cross-fertilization between language engineers and corpus linguists. The Language Technology Group at the University of Edinburgh's Division of Informatics, with support from EPSRC, ESRC, the EU, and other sources, has invested substantial effort over the last six years in building up an inventory of tools and technologies for the markup and analysis of language data expressed in XML. This includes tools which support hand annotation, methods for configurable, user-specified data display, and ways of specifying pattern matches over data so that areas of specific interest can be extracted or even annotated automatically. These tools have been used, for instance, for language corpus annotation, tokenization, exploring dialogue structure, and finding named entities, dates, and times in the data. The goal of this half-day workshop is to show how these technologies can be of use in corpus linguistics and to begin a dialogue in which we, as technology providers, come to understand the needs of corpus linguists better. The workshop will largely consist of lectures in which we will introduce the W3C XML-related standards at the heart of our work and the different tools available from both Edinburgh and elsewhere. The material presented will assume some facility with computers, but will introduce all of the necessary XML and data processing concepts. The workshop will end with a discussion about where these tools are headed and whether anything needs to be done to make them fit the needs of this community better."

Note also the conference session on 'Markup And Tools':

  • Kiril Simov, Zdravko Peev, Milen Kouylekov, Alexander Simov, Marin Dimitrov, Atanas Kiryakov: "CLaRK - An XML Based System for Corpora Development"
  • Julien Nioche & Benont Habert: "Using Feature Structures as Representation: Format for Corpora Exploration"
  • Sylvie Porhiel: "Linguistic expressions as a tool to extract thematic information"
  • Hatem Ghorbel, Afzal Ballim: "ROSETTA: RhetOrical and Semantic Environment for TexT Alignment"

From the announcement posted by Andrew Wilson of Lancaster University:

Workshops At Corpus Linguistics 2001 - Lancaster University (UK), 29 March 2001.

In conjunction with the "Corpus Lingustics 2001" conference at Lancaster University (UK), four workshops have been organized for 29 March:

  1. Corpus-Based and Processing Approaches to Figurative Language. Organisers: John Barnden (University of Birmingham), Mark Lee (University of Birmingham), Katja Markert (University of Edinburgh)
  2. Corpus Linguistics, Ancient Languages, and Older Language Periods. Organiser: Andrew Wilson (Lancaster University)
  3. XML Markup Technologies for Working with Linguistic Data. Organisers: Jean Carletta and Henry Thompson (University of Edinburgh and W3C).
  4. Using [SGML-based] SARA to explore the BNC World Edition. Organiser: Lou Burnard (Oxford University Computing Services)

[RCC Note: BNC Sampler CD - four software systems are on the CD, one (Qwick) operates against a compressed XML version of the corpus...Qwick is a corpus browser that allows you to build up your own working corpus, retrieve concordance lines using a simple but powerful query language, and to compute collocation statistics using a variety of adjustable parameters. It is implemented in Java, and it thus platform independent. It has been tested on Windows and Solaris, and (according to Lou Burnard from OUCS) also runs on the Apple Macintosh. Qwick can handle markup in XML format..."]

Further details of these are available on the Web at

To register, please use the special workshops section of the CL2001 registration form at

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: