A research project coordinated through the Tokyo Cyber Assist Research Center has developed an XML vocabulary and DTD for linguistic annotation of web documents. The Global Document Annotation Initiative research team has proposed this XML-based tag set to help computing machines "automatically infer the underlying semantic/pragmatic structure of documents. The tag set is being developed so as to be easy to embed into TEI, EAGLES, and HTML vocabularies. The GDA tag set is designed so that the GDA-annotation reduces the ambiguity in mapping a document to a sort of entity-relation graph (or semantic network) representing the underlying semantic structure. The tag set does not directly encode such graphs, though it should be straightforward to encode them with RDF or related tag sets such as DAML. A chief goal of the GDA iniative is to support AI applications such as machine translation, information retrieval, information filtering, data mining, consultation, expert systems, and so on."
Features of Automatic Text Summarization: "(1) A domain/style-free algorithm using spreading activation on an intra-document network of text segments connected via syntactic, semantic, and rhetorical relations and coreference relations. (2) Linguistic manipulation such as coreferential subsitution and parse-tree pruning. (3) A flexible summary generator which can dynamically generate summaries of various sizes. (4) A personalization mechanism which can reflect readers' interests and preferences."
In this connection, note the announcement for the upcoming 'First NLP and XML Workshop'. A call for participation and preliminary agenda have been published for this 'First NLP and XML Workshop', to be held in Tokyo on November 30, 2001, in conjunction with the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS 2001). The workshop keynote address on "NLP and XML" is to be given by Dr. Koiti Hasida, Deputy Director of the Tokyo Cyber Assist Research Center, and a principal in the Global Document Annotation (GDA) Initiative. For details, contact the workshop organizers: Naoyuki Nomura (Justsystem, and Hosei University) or Chieko Nakabasami (Toyo University).
Principal references:
- GDA web site, English
- GDA XML DTD [cache]
- The GDA Tag Set Draft Version 0.65, June 12, 2001, or later. [cache 2001-09]
- Revisions of the GDA Tag Set
- "Automatic Text Summarization Based on the Global Document Annotation." [Slide set] from Coling '98 presentation by Katashi Nagao (Sony Computer Science Laboratories Inc.) and Koiti Hasida (Electrotechnical Laboratory).
- "Global Document Annotation Initiative (GDA)" - Main reference page.