ECAI 2004 Conference Paper

[PDF] [full paper] [prev] [tofc] [next]

An Intrinsic Information Content Metric for Semantic Similarity in WordNet

Nuno Seco, Tony Veale, Jer Hayes

Information Content (IC) is an important dimension of word knowledge when assessing the similarity of two terms or word senses. The conventional way of measuring the IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with statistics on their actual usage in text as derived from a large corpus (e.g., [15]). In this paper we present a wholly instrinsic measure of IC that relies on hierarchical structure alone. We report that this measure is consequently easier to calculate, yet when used as the basis of a similarity mechanism it yields judgments that correlate more closely with human assessments than other, extrinsic measures of IC that additionally employ corpus analysis. We report a resulting correlation value of 0.84 between human and machine similarity judgments on the dataset of Miller and Charles [13], which is suggestively close to the upper-bound of 0.88 postulated by Resnik in [16].

Keywords: Semantic Similarity, Natural Language Processing, WordNet

Citation: Nuno Seco, Tony Veale, Jer Hayes: An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In R.López de Mántaras and L.Saitta (eds.): ECAI2004, Proceedings of the 16th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2004, pp.1089-1090.

[prev] [tofc] [next]

ECAI-2004 is organised by the European Coordinating Committee for Artificial Intelligence (ECCAI) and hosted by the Universitat Politècnica de València on behalf of Asociación Española de Inteligencia Artificial (AEPIA) and Associació Catalana d'Intel-ligència Artificial (ACIA).