SGML: SIGIR '95 Course on Text Encoding

SGML: SIGIR '95 Course on Text Encoding

----------------------------Original message----------------------------



                     Lou Burnard, Oxford University
                   Judith Klavans, Columbia University
        C. M. Sperberg-McQueen, University of Illinois at Chicago

                         A PRE-CONFERENCE COURSE
                     to be held in association with
                               SIGIR '95:
        18th International Conference on Research and Development
                        in Information Retrieval
                            Seattle, WA, USA
                         Saturday, July 8, 1995
                          8:30 a.m. - 3:30 p.m.

   SIGIR '95, an international research conference on
information retrieval theory, systems, practice and
applications, will be held in Seattle, WA, from July 9-13. On
the Saturday prior to the conference, a one-day course will be
offered covering the theory and practice of markup languages
for the representation of textual and other data, such as SGML
and the Text Encoding Initiative.  Taught by Lou Burnard,
Judith Klavans, and C. M. Sperberg-McQueen.

   The representation of textual data has raised serious
problems since the early days of digital technology.
Incompatibility between representations range from simple
formatting issues, such as word delimitation, to data encoding
schemes, such as 7-bit encoding for English, 8-bit for
accented languages, up to 32-bit for Asian languages.
Furthermore, the complications seem to be growing as the
amount of digital data increases.  Recognizing the predicament
these complications cause in the information age, a group of
researchers and practitioners, sponsored by the Association
for Computational Linguistics, the Association for Computers
and the Humanities, and the Association for Literary and
Linguistic Computing, joined in 1988 to explore ways to
resolve the serious emerging incompatibilities in the
representation of text.  The Text Encoding Initiative has
addressed these problems by developing detailed SGML Document
Type Definitions (DTDs) to achieve comprehensive and
generalizable encoding standards for a range of data types,
from verse to syntactic analyses, from spoken language to
hypertext, from terminological data to multilingual corpora.

   This one-day course will consist of three parts: the first
will describe the challenges raised by the three ``abilities''
which concern effective text representation: reusability,
interchangeability, and compatibility.  The next section of
the course will present the types of data handled so far by
the TEI encoding scheme, some of the problems already solved,
some ongoing projects, and some unsettled questions.  If
hands-on is possible, we will provide a session to experience
the strengths of using the TEI for building intelligent text
data bases from existing on-line texts.  Otherwise, we will
demonstrate widely available software and discuss practical
issues in using the TEI for building intelligent text data
bases from existing on-line texts.

   The course will be of interest to: computer scientists who
are building large test-beds of textual data, researchers who
must analyze and encode representational systems over such
data, practitioners who must solve the incompatibility problem
by choosing a standard encoding scheme for textual data, SGML
hackers who want to know more about TEI DTDs, humanists who
want to learn more about the issues in text representation.
Since most of IR currently operates over textual data, the
indexing issues in the TEI are of particular and pressing
interest to the IR audience.

   Further information can be found at:
Questions re workshop content should be directed to C.M.
Sperberg-McQueen,; addresses for
queries re registration and accommodation are given below.

   All participants will be provided with a printed
introductory summary guide to the TEI scheme and supporting
materials on PC disks, including full versions of the TEI
DTDs, public domain SGML software and sample TEI texts.  The
electronic version of the Guidelines will also be provided.

   Lou Burnard, of Oxford University Computing Services, is
the European editor of the TEI project.  He has degrees in
English literature from Oxford, and has worked in computers
since the seventies.  His areas of expertise are in the
applications of computing to linguistic and literary research,
particularly with reference to database and text retrieval
systems.  He has published and lectured widely on these and
related topics. His present responsibilities, aside from TEI
work, include management of the British National Corpus
project at OUCS, and the Oxford Text Archive, of which he is

   Judith Klavans is the Director of the Center for Research
on Information Access (CRIA) at Columbia University.  The
goals of the Center, established in January 1995, are to
integrate and coordinate the various digital library related
activities at Columbia University, to push forward research on
technologies related to information access, and to serve as a
source of information on the technological aspects of digital
library applications to external projects.  Dr. Judith Klavans
has a research career which combines aspects of computer
science and linguistics, including the automatic acquisition
of lexical knowledge, multilingual text analysis, and the
development of symbolic techniques for the presentation of
information within the context of digital libraries.

   C. M. Sperberg-McQueen is a senior research programmer at
the academic computer center at the University of Illinois at
Chicago; he currently works in the database group, on SGML
applications and the university library's information arcade.
Since 1988 he has been editor in chief of the ACH/ACL/ALLC
Text Encoding Initiative.

   Cost of the course is $50 before May 29 and $65 after May
29 which includes a box lunch and course documentation.  The
attached registration form covers this course only.

   Attendance at SIGIR '95 is not required for this course.
Those wishing to attend SIGIR as well should complete the
separate SIGIR registration form; a copy plus full information
on SIGIR '95, including descriptions of tutorials, workshops,
all technical sessions, and accommodation, etc. is available
from (\public\sigir95\program) by
anonymous ftp; or via WWW at URL:
sigir/conferences/SIGIR_95_adv.pgm.html; or request a copy of
the program by mail by contacting

   The course venue will depend on enrolment but at present it
is expected that it will be at the SIGIR conference hotel, the
Seattle Sheraton Hotel & Towers, 1400 Sixth Avenue, Seattle,
WA 98101.  Details of conference accomodation are available
from the ftp and www addreses above.

Cut here: >--------------------------------------------------

                      in conjunction with SIGIR '95
                     Seattle, WA, USA, July 8, 1995

Please use block letters or type, and tick where appropriate

 __ Mr.    __ Ms.    __ Dr.    __ Prof.     Other: ______

LAST NAME:________________ FIRST NAME:_______________________

BADGE NAME (if different): __________________________________



CITY:__________________   STATE:______   ZIP CODE: __________

COUNTRY:_______________   PHONE:  ( ___ )____________________

FAX:  ( ___ ) _______________ EMAIL: ________________________

$50 prior to May 29; $65 after May 29)    $ ________________


ARE YOU ALSO ATTENDING SIGIR '95?   ____  yes    ____ no

METHOD OF PAYMENT (US Currency only):

__ Check payable to ACM/SIGIR95
__ Credit card (Visa, MC, AMEX)
Credit card number, expiration date

Signature, date
(I authorize to charge my account fees indicated above)

Return Registration Form by May 29 to qualify for early
registration.  Use fax or email (credit card payment) or mail
check or credit card) to:
   c/o Convention Services Northwest
   1809 Seventh Avenue, Suite 1414
   Seattle, WA 98101 USA
   Fax:  +1 206-292-0559
(Registration queries to: +1 206-292-9198 (Ask for Sarah