Extracting key concepts from clinical texts for indexing is an important task in implementing a medical digital library: improving the access to quality information on health for both the patient and health professionals contributes to the reduction of medical errors and to increasing safety and efficiency; while the creation of a semantically informed interpretation of texts facilitates patient mobility and borderless access to health care. This paper presents the results of the development of a high throughput, real time modularized text analysis and information retrieval system that identifies clinically relevant entities in radiological reports maps, the entities to several standardized nomenclatures and thus making them available for subsequent information retrieval and data mining. The main goal of this prototype was to improve access to radiological clinical reports, and, consequently, enable faster and more accurate statistical data creation and analysis. Popular Information Extraction programs can resolve some of these problems by offering a semantically informed interpretation and abstraction of the texts. Unfortunately, the great majority of these tools can be applied to English texts only as their syntactic algorithms are tailored for this language. Moreover, the rare multilingual applications present in literature are not appropriate for real time applications. In an effort to provide non-English languages with tools that are necessary in modern medical information processing, this paper presents a real time IE system that aims at tagging medical corpora available in some non-English languages with UMLS concepts.
Information extraction system for tagging Italian medical texts with UMLS concepts
PIVETTI, SUSANNA;GIACOMINI, MAURO
2012-01-01
Abstract
Extracting key concepts from clinical texts for indexing is an important task in implementing a medical digital library: improving the access to quality information on health for both the patient and health professionals contributes to the reduction of medical errors and to increasing safety and efficiency; while the creation of a semantically informed interpretation of texts facilitates patient mobility and borderless access to health care. This paper presents the results of the development of a high throughput, real time modularized text analysis and information retrieval system that identifies clinically relevant entities in radiological reports maps, the entities to several standardized nomenclatures and thus making them available for subsequent information retrieval and data mining. The main goal of this prototype was to improve access to radiological clinical reports, and, consequently, enable faster and more accurate statistical data creation and analysis. Popular Information Extraction programs can resolve some of these problems by offering a semantically informed interpretation and abstraction of the texts. Unfortunately, the great majority of these tools can be applied to English texts only as their syntactic algorithms are tailored for this language. Moreover, the rare multilingual applications present in literature are not appropriate for real time applications. In an effort to provide non-English languages with tools that are necessary in modern medical information processing, this paper presents a real time IE system that aims at tagging medical corpora available in some non-English languages with UMLS concepts.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.