Web Information Extraction: a domain, user adaptive and multilingual approach

Vangelis Karkaletsis, Constantine D. Spyropoulos

For PAIS 2004 This paper describes an advanced prototype system for web information retrieval and extraction adaptable to different domains, languages and users’ interests. This system has been developed in the context of a R&D project involving both academic and industrial organisations. Two different applications were released at the project’s site in four different languages. The system’s architecture is open, modular and multi-agent integrating components for collecting domain-specific web pages using crawling and spidering technologies, for extracting information from the collected web pages using natural language processing and machine learning techniques, and for presenting the extracted information according to users’ interests employing user modelling techniques. A customisation infrastructure is also provided involving an ontology management system and various customisation tools.

Keywords: information retrieval, information extraction, user modelling, machine learning, multilinguality

