A Semantic Model for Information Retrieval in Documents : an experiment with patient medical records.

Line Poullet, Sylvie Calabretto, and Jean-Marie Pinon., Laboratoire Ingnierie des Systemes Information - INSA de Lyon.

Patient medical records contain a great number of information distributed in different kinds of documents: diagnosis, prescription, symptoms observations or radiology analysis, etc. Documents heterogeneity makes specific information retrieval difficult to the medical staff members. This paper shows how a semantic model of documents enables handling information stored in these documents. It enables the definition of a generic semantic structure of a medical record: this structure expresses the implicit content of the each document element by specifying what kind of information is required. Moreover, it enables to display relevant information depending on the reader. This semantic model relies on meaning representation of information units (i.e. the logical units). This meaning representation is distributed in the overall architecture model : the model binds a semantic structure, a logical structure of documents (i.e. a SGML DTD) and a domain model. The semantic structure contains two levels of description: 1. meaning representation of information units. The Conceptual Graphs formalism is used to represent semantics of documents elements. 2. and document rhetorical organization. The domain model contains a medical ontology of concepts and relations between concepts, used for guiding the meaning representation of information units. Generic semantic structure defines the generic organization of documents content for a specific class of patient medical record. Each element of this structure defines the a priori semantic content of the associated logical element(s). For example, a record for a cardiologist may contain a general description of the patient (civil status, age, ...), a few ECGs with an analysis and some treatments. The generic semantic element Description of patient (corresponding to the logical elements : Civil Status, General Information) specifies the concepts (patient, address, age, sex, ...) and the relations between them (patient is a person who has an address, some medical antecedents, etc.). When instanciating the generic semantic structure (for creating the document), the semantic elements are instanciated: Mr. B. is the patient, he had a coronary thrombosis, etc. According to this semantic representation of a document, a semantic dissimilarity measure is defined to compare the semantics of information units. This measure is applied to evaluate understanding of documents elements, i.e. to evaluate whether a semantic element is relevant for a query. Using such a documents base, users can query for an information (what is Mr. B's blood group?) or for a part of document (the last Mr. B's ECG). A generic semantic structure is defined as a DTD (Document Type Definition) in SGML (Standard Generalized Markup Language) syntax. Links between semantic and logical elements are defined using the ID/IDREF mechanism. This semantic definition of a document enables information stored in documents to be retrieved by querying semantic units. Our approach allows the extension of retrieval information definition (vs. statistical techniques), by taking into account formal semantic representation of texts. This approach relies on the integration of two paradigms for representing the same information : structured documents and knowledge base of Conceptual Graphs.

