Udo Heuser, Wolfgang Rosenstiel
This paper presents a way to cluster (local) HTML document sets in an hierarchical way. The hierarchical clustering is performed using the Hierarchical Radius-based Competitive Learning (HRCL) neural network that has been developed by the authors and is made public for the first time. After a detailed discussion of the algorithm, HRCL clustering as well as retrieval results will be presented. The HRCL clustering results in a hierarchical multi-resolution view of the underlying (local) HTML data collection, consisting of clusters (with its cluster centroids), sub-clusters, sub-subclusters and so forth. Comparisons with the self-organizing map (SOM) as well as with the single-pass statistical clustering already used in the SMART retrieval show that HRCL is - in contrast to the latters - able to obtain both, a real vector quantization as well as a good descrition of distributions of the globular input clusters. Moreover, HRCL retrieval results can excede the retrieval while using the SOM. Results of all generated HRCL hierarchies will be combined and rendered in the way of an Internet catalogue resembling the known Yahoo directory (an example will be shown in the paper). The Internet search can finally be accelerated using the automatically generated (sub-)cluster centroids.
Keywords: Search, Neural Networks, Information Retrieval and Presentation
Citation: Udo Heuser, Wolfgang Rosenstiel: Automatic Generation of Local Internet Catalogues Using Hierarchical Radius-based Competitive Learning. In W.Horn (ed.): ECAI2000, Proceedings of the 14th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2000, pp.306-310.