[This local archive copy mirrored from the canonical site: http://www.irisa.fr/ep98/tutorials.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
Tutorial EP1 | Introduction to SGML and XML |
Tutorial EP2 | Advanced XML |
Tutorials EP3 and EP4 | DSSSL & XSL |
Tutorial EP5 | Encoding Information for Interchange: An Introduction to the TEI |
Tutorial EP6 | Mapping Websites and Creating Site Maps |
Tutorial EP7 | Website Information Architecture |
Tutorial EP8 | Document Image and Content Analysis |
Tutorial EP9 | Du plomb aux pixel : 1) gravure des poinçons |
Tutorial EP10 | Du plomb aux pixel : 2) dessin de caractère par ordinateur |
The first module is an introduction to the concepts, components and syntax of the ISO standard Document Style Semantics and Specification Language (DSSSL - ISO/IEC 10179) and related Extensible Stylesheet Language (XSL - W3C). Also included is the relationship of these standards to the Standard Generalized Markup Language (SGML - ISO 8879) and Extensible Markup Language (XML - W3C) families of standards.
The second module builds on the knowledge of DSSSL and XSL concepts with hands-on exercises to practice the techniques. The exercises include the use of both standardized style semantics and custom SGML transformation semantics.
This seminar will review the steps in planning and executing a sound information architecture for both public Internet sites and private corporate Intranets.
Document analysis systems automatically extract information from scanned images of paper-based documents. Such systems recognize characters, establish spatial and semantic relationships, and determine the layout and logic structure. Their output facilitates efficient storage, retrieval, and subsequent processing of the documents, e.g. in a workflow.
The tutorial gives an overview about the general procedures used in the domain of document analysis and recogntion with a strong relationship to different applications and their specific problems, like address reading, form reading, free forms understanding, etc.
The tutorial starts with the introduction of different applications, particularly discussing the objectives of document analysis with respect to the application and essential technological issues. All processing steps for converting a scanned document image into data structures which convey its meaning are covered in this course: it starts with operations on the scanned image (pixel format), turns to segmentation issues, focusses on classification and discusses issues of the logic structure of documents (information extraction). The emphasis lies on techniques for recognizing objects (e.g. character classification, cut classifier,...) as a basic and multi-applicable technique, and information extraction for identifying the logic structure and the meaning of certain entities of documents. For all intermediate steps different approaches are described and their performance is assessed. In each step, results of these techniques are evaluated on examples from relevant applications and ope problems are discussed.
Ph. Louarn - Jan. 1998 - © Irisa 1997-1998