LREC Post-Conference Workshop on XML-Based Richly Annotated Corpora

Workshop on XML-Based Richly Annotated Corpora

First Announcement and Call for Papers

LREC Post-Conference Workshop
29th May 2004
Centro Cultural de Belem, LISBON, Portugal
WWW: http://coli.lili.uni-bielefeld.de/forschung/xbrac/

Call for Papers

XML has become a de facto standard for the representation of corpus resources. It is being used for representing speech and text corpora, multimodal and multimedial corpora, as well as, in particular, integrated corpora which combine different modalities. XML-based representations make it easier to work with richly annotated corpora, which include annotations from different levels of linguistic description or from different modalities. A number of tools have also become available, over the last few years, for creating, managing, annotating, querying such corpora and for their statistical exploration.

Although XML is a useful representation language, its use alone does not solve all the problems and choices with respect to the representation style (e.g., stand-off annotations vs. embedded annotations); these are in turn closely linked with questions of the architecture of richly annotated corpora, such as the following: should information from different levels of linguistic description be represented in separate "layers" of the annotation? Should a given information type serve as a grounding for all or some of the others? How to account for interdependencies and interaction between phenomena from different levels of description? How to account for concurrent annotation (one phenomenon, different analyses or theories/approaches)? Such questions and the pertaining corpus-architectural considerations interact with at least two more problem areas: on the one hand with the kinds of research questions and of phenomena to be analysed in linguistic and natural interaction research (which may call for certain architectural solutions), and on the other hand with tools for the creation, annotation, manipulation and exploration of XML-based corpora.

The workshop will attempt to address the interplay between the following research areas:

1. Representation Techniques

XML techniques for corpus representation, i.e.,

Standoff annotation vs. embedded annotation
Use of XML linking standards for language data (XLink, XPointer, XPath); other ways of ensuring relationships between levels, e.g., through naming conventions
Concepts of layering in corpora annotated at several levels of linguistic description; types of information grouped together vs. distributed over different "packages"
Hierarchical vs. flat annotation
the grounding of annotations (e.g., in XML elements vs. in characters?) and its implications
techniques for the manipulation of XML-based representations for massively annotated corpora usefulness and relevance of XQuery

2. Description Levels

Levels of linguistic description and their interaction, i.e.,

Examples of richly annotated corpora: reasons for the choice of the annotated levels; linguistic and natural interactivity research questions which can (only) be solved with richly annotated data
Interaction between levels: new research questions in linguistics and natural interactivity research which can only be addressed because of observation across levels, across modalities, etc. An example is the use of clustering techniques across different levels: e.g., relevant cooccurrences of phenomena from different levels identified via clustering
Use and usefulness of concurrent annotations in XML-based corpora; an example is concurrent flat and deep syntactic analysis

3. Tools

Tools for handling richly annotated corpora: Software solutions for, e.g.,

corpus creation, transformation, exchange, and validation
interactive annotation
exploration: query and retrieval, statistical analysis
corpus management (e.g., wrt. meta-data)

Tools presented should be positioned with respect to the questions of corpus architecture and with respect to the research directions discussed above under (1) and (2).

The workshop aims at bringing together XML experts, both theorists and practitioners, as well as linguists and natural interactivity researchers working on the definition of corpus architectures, annotation and resource exchange schemes and on tools for the use of multilevel and/or multi-layer annotated corpora. It will provide a forum for the definition of requirements for corpus representations and pertaining tools, discussing at the same time case studies from linguistics and natural interactivity research.

Organisers

Andreas Witt, Bielefeld University
Ulrich Heid, University of Stuttgart
Henry S. Thompson, University of Edinburgh
Jean Carletta, University of Edinburgh
Peter Wittenburg, MPI for Psycholinguistics Nijmegen

Program Committee

Jean Carletta, University of Edinburgh, UK
Ulrich Heid, University of Stuttgart, Germany
Henning Lobin, Justus-Liebig-Universitdt Gie_en, Germany
Dieter Metzing, Bielefeld University, Germany
Joakim Nivre, Vdxjv University, Sweden
Vito Pirrelli, Istituto di Linguistica Computazionale del CNR, Pisa, Italy
Gary Simons, SIL International, Taxas, USA
Henry S. Thompson, University of Edinburgh, UK
Jun'ichi Tsujii, University of Tokyo, Japan
Andreas Witt, Bielefeld University, Germany
Peter Wittenburg, MPI for Psycholinguistics Nijmegen, Netherlands

Submissions

Authors are invited to submit papers for oral presentation in any of the areas listed above. Only full papers will be accepted, and the length of the paper should not exceed 8 pages.

Requirements for Paper Submission

Submissions must be full papers, not extended abstracts.
It is highly recommendedauthors submit papers in the LREC-conference proceedings format (maximum of 8 pages).
Submission in other formats will be accepted (font sizes of 11 or 12 point), however they can be no longer than eight (8) pages including figures, tables, and references, formatted for A4-paper with reasonable margins.
Electronic submission of manuscripts (details in the submission site) is required (PDF preferred, Postscript, and ASCII accepted).
An additional title page should include the title, author(s), affiliation(s), contact email address, postal address, telephone, fax and URL as well as five keywords.

Submission should be sent by email to andreas.witt@uni-bielefeld.de before 15th February 2004.

[Posted by Andreas Witt andreas.witt@uni-bielefeld.de]

Prepared by Robin Cover for The XML Cover Pages archive.

SEARCH Advanced Search ABOUT Site Map CP RSS Channel Contact Us Sponsoring CP About Our Sponsors NEWS Cover Stories Articles & Papers Press Releases CORE STANDARDS XML SGML Schemas XSL/XSLT/XPath XLink XML Query CSS SVG TECHNOLOGY REPORTS XML Applications General Apps Government Apps Academic Apps EVENTS LIBRARY Introductions FAQs Bibliography Technology and Society Semantics Tech Topics Software Related Standards Historic	LREC Post-Conference Workshop on XML-Based Richly Annotated Corpora Workshop on XML-Based Richly Annotated Corpora First Announcement and Call for Papers LREC Post-Conference Workshop 29th May 2004 Centro Cultural de Belem, LISBON, Portugal WWW: http://coli.lili.uni-bielefeld.de/forschung/xbrac/ Call for Papers XML has become a de facto standard for the representation of corpus resources. It is being used for representing speech and text corpora, multimodal and multimedial corpora, as well as, in particular, integrated corpora which combine different modalities. XML-based representations make it easier to work with richly annotated corpora, which include annotations from different levels of linguistic description or from different modalities. A number of tools have also become available, over the last few years, for creating, managing, annotating, querying such corpora and for their statistical exploration. Although XML is a useful representation language, its use alone does not solve all the problems and choices with respect to the representation style (e.g., stand-off annotations vs. embedded annotations); these are in turn closely linked with questions of the architecture of richly annotated corpora, such as the following: should information from different levels of linguistic description be represented in separate "layers" of the annotation? Should a given information type serve as a grounding for all or some of the others? How to account for interdependencies and interaction between phenomena from different levels of description? How to account for concurrent annotation (one phenomenon, different analyses or theories/approaches)? Such questions and the pertaining corpus-architectural considerations interact with at least two more problem areas: on the one hand with the kinds of research questions and of phenomena to be analysed in linguistic and natural interaction research (which may call for certain architectural solutions), and on the other hand with tools for the creation, annotation, manipulation and exploration of XML-based corpora. The workshop will attempt to address the interplay between the following research areas: 1. Representation Techniques XML techniques for corpus representation, i.e., Standoff annotation vs. embedded annotation Use of XML linking standards for language data (XLink, XPointer, XPath); other ways of ensuring relationships between levels, e.g., through naming conventions Concepts of layering in corpora annotated at several levels of linguistic description; types of information grouped together vs. distributed over different "packages" Hierarchical vs. flat annotation the grounding of annotations (e.g., in XML elements vs. in characters?) and its implications techniques for the manipulation of XML-based representations for massively annotated corpora usefulness and relevance of XQuery 2. Description Levels Levels of linguistic description and their interaction, i.e., Examples of richly annotated corpora: reasons for the choice of the annotated levels; linguistic and natural interactivity research questions which can (only) be solved with richly annotated data Interaction between levels: new research questions in linguistics and natural interactivity research which can only be addressed because of observation across levels, across modalities, etc. An example is the use of clustering techniques across different levels: e.g., relevant cooccurrences of phenomena from different levels identified via clustering Use and usefulness of concurrent annotations in XML-based corpora; an example is concurrent flat and deep syntactic analysis 3. Tools Tools for handling richly annotated corpora: Software solutions for, e.g., corpus creation, transformation, exchange, and validation interactive annotation exploration: query and retrieval, statistical analysis corpus management (e.g., wrt. meta-data) Tools presented should be positioned with respect to the questions of corpus architecture and with respect to the research directions discussed above under (1) and (2). The workshop aims at bringing together XML experts, both theorists and practitioners, as well as linguists and natural interactivity researchers working on the definition of corpus architectures, annotation and resource exchange schemes and on tools for the use of multilevel and/or multi-layer annotated corpora. It will provide a forum for the definition of requirements for corpus representations and pertaining tools, discussing at the same time case studies from linguistics and natural interactivity research. Organisers Andreas Witt, Bielefeld University Ulrich Heid, University of Stuttgart Henry S. Thompson, University of Edinburgh Jean Carletta, University of Edinburgh Peter Wittenburg, MPI for Psycholinguistics Nijmegen Program Committee Jean Carletta, University of Edinburgh, UK Ulrich Heid, University of Stuttgart, Germany Henning Lobin, Justus-Liebig-Universitdt Gie_en, Germany Dieter Metzing, Bielefeld University, Germany Joakim Nivre, Vdxjv University, Sweden Vito Pirrelli, Istituto di Linguistica Computazionale del CNR, Pisa, Italy Gary Simons, SIL International, Taxas, USA Henry S. Thompson, University of Edinburgh, UK Jun'ichi Tsujii, University of Tokyo, Japan Andreas Witt, Bielefeld University, Germany Peter Wittenburg, MPI for Psycholinguistics Nijmegen, Netherlands Submissions Authors are invited to submit papers for oral presentation in any of the areas listed above. Only full papers will be accepted, and the length of the paper should not exceed 8 pages. Requirements for Paper Submission Submissions must be full papers, not extended abstracts. It is highly recommendedauthors submit papers in the LREC-conference proceedings format (maximum of 8 pages). Submission in other formats will be accepted (font sizes of 11 or 12 point), however they can be no longer than eight (8) pages including figures, tables, and references, formatted for A4-paper with reasonable margins. Electronic submission of manuscripts (details in the submission site) is required (PDF preferred, Postscript, and ASCII accepted). An additional title page should include the title, author(s), affiliation(s), contact email address, postal address, telephone, fax and URL as well as five keywords. Submission should be sent by email to andreas.witt@uni-bielefeld.de before 15th February 2004. [Posted by Andreas Witt andreas.witt@uni-bielefeld.de] Prepared by Robin Cover for The XML Cover Pages archive.