ECAI-2000 Logo

ECAI-2000 Conference Paper

[PDF] [full paper] [prev] [tofc] [next]

Essence: A Portable Methodology for Acquiring Information Extraction Patterns

Neus CatalÓ, N˙ria Castell, Mario Martin

One important issue when constructing Information Extraction systems is how to obtain the knowledge needed for identifying relevant information in a document. In most approaches to this issue, the human expert intervention is necessary in many steps of the acquisition process. In this paper we describe Essence, a new methodology that reduces significantly the need for human intervention. It is based on ELA, a new algorithm for acquiring information extraction patterns. The distinctive features of Essence and ELA are that 1) allow to automatically acquire IE patterns from unrestricted text corpus representative of the domain, due to 2) the ability of identifying surrounding context regularities for semantically relevant concept-words for the IE task by using non domain specific lexical knowledge tools and semantic relations from WordNet, and 3) restricting the human intervention to only the definition of the task and the validation and typification of the set of IE patterns obtained. The use of a general purpose ontology and syntactic tools of general application allows the easy portability of the methodology and reduces the expert effort. Results of the application of this methodology for acquiring extraction patterns in a MUC-like task are also shown.

Keywords: Information Extraction, Natural Language Processing, Machine Learning, Knowledge Acquisition

Citation: Neus CatalÓ, N˙ria Castell, Mario Martin: Essence: A Portable Methodology for Acquiring Information Extraction Patterns. In W.Horn (ed.): ECAI2000, Proceedings of the 14th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2000, pp.411-415.

[prev] [tofc] [next]

ECAI-2000 is organised by the European Coordinating Committee for Artificial Intelligence (ECCAI) and hosted by the Humboldt University on behalf of Gesellschaft für Informatik.