ECAI 2004 Conference Paper

[PDF] [full paper] [prev] [tofc] [next]

Adaptive, Multilingual Named Entity Recognition in Web Pages

Georgios Petasis, Vangelis Karkaletsis, Claire Grover, Benjamin Hachey, Maria-Teresa Pazienza, Michele Vindigni, Jose Coch

The identification of interesting web sites and web pages and the extraction of information from them is an interesting but complex task. Most of the information on the web today is in the form of HTML documents, which are designed for presentation purposes and not for machine understanding and reasoning. The extraction task becomes even harder in a multilingual context, where web pages in different languages need to be analysed. The majority of existing systems needs to be manually configured for new domains, a process that requires substantial effort and time. This paper presents an adaptive, multilingual named entity recognition and classification (NERC) technology, which can be easily customised to new domains and to new languages. Our evaluation results demonstrate the viability of our approach.

Keywords: information extraction, named entity recognition, machine learning, multilinguality

Citation: Georgios Petasis, Vangelis Karkaletsis, Claire Grover, Benjamin Hachey, Maria-Teresa Pazienza, Michele Vindigni, Jose Coch: Adaptive, Multilingual Named Entity Recognition in Web Pages. In R.López de Mántaras and L.Saitta (eds.): ECAI2004, Proceedings of the 16th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2004, pp.1073-1074.


[prev] [tofc] [next]


ECAI-2004 is organised by the European Coordinating Committee for Artificial Intelligence (ECCAI) and hosted by the Universitat Politècnica de València on behalf of Asociación Española de Inteligencia Artificial (AEPIA) and Associació Catalana d'Intel-ligència Artificial (ACIA).