|
[full paper] |
Madalina M. Drugan, Linda C. van der Gaag
When constructing a Bayesian network classifier from data, the accuracy of the resulting classifier can often be improved upon by selecting a subset of the available features. While the Minimum Description Length (MDL) function is generally accepted as a suitable function for comparing the qualities of alternative classifiers over a fixed set of features, we show that it is not suited for the task of feature subset selection. We introduce a new MDL-based function, called MDL-FS, and show that it is better tailored to the task of identifying and removing redundant features. We present the results from experiments in which we compare the performance of the two functions. These results demonstrate that, with the MDL-FS function, classifiers are yielded that have an accuracy comparable to the ones found with the MDL function, yet include fewer attributes.
Keywords: Bayesian network classifiers, Machine learning, Feature selection, Minimum description length
Citation: Madalina M. Drugan, Linda C. van der Gaag: A New MDL-based Function for Feature Selection for Bayesian Network Classifiers. In R.López de Mántaras and L.Saitta (eds.): ECAI2004, Proceedings of the 16th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2004, pp.999-1000.