|
[full paper] |
Giorgio Maria Di Nunzio, Alessandro Micarelli
The analysis of two heuristic supervised learning algorithms for text categorization in two dimensions is presented here. The graphical properties of the bidimensional representation allows one to tailor a geometrical heuristic approach in order to exploit the peculiar distribution of text documents. In particular, we want to investigate the theoretical linear cost of the algorithms and try to push the performance to the limit. The experiments on Reuters-21578 standard benchmark confirm that this approach is an alternative to the standard linear learning models, such as support vector machines, for text classification. Moreover, due to the fast training session, this approach may also be considered as a support for text categorization systems for fast graphical investigations of large collections of documents.
Keywords: Text Categorization, Machine Learning, Information Models, Text Representation
Citation: Giorgio Maria Di Nunzio, Alessandro Micarelli: Pushing "Underfitting" to the Limit: Learning in Bidimensional Text Categorization. In R.López de Mántaras and L.Saitta (eds.): ECAI2004, Proceedings of the 16th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2004, pp.465-469.