|
[full paper] |
Kang Peng, Slobodan Vucetic, Zoran Obradovic
Advances in data collection technologies allow accumulation of large and high dimension datasets and provide opportunity for learning high quality classification and regression models. However, supervised learning from such data raises significant computational challenges including inability to preserve the data in the computer main memory and need to optimize model parameters within given time constraints. For certain types of predicting models techniques have been developed for learning from large datasets, but few of them address efficient learning of neural networks. Towards this objective, in this study we proposed a procedure that automatically learns a series of neural networks of different complexities on smaller data chunks and then properly combines them into an ensemble predictor through averaging. Based on the idea of progressive sampling the proposed approach starts with a very simple network trained on a very small data chunk and then progressively increases the model complexity and the data chunk sizes until the learning performance no longer improves. Our empirical study on three real life large datasets suggests that the proposed method is successful in learning complex concepts from large datasets with low computational effort.
Keywords: neural networks, large datasets, ensemble predictor, progressive sampling, data mining
Citation: Kang Peng, Slobodan Vucetic, Zoran Obradovic: Learning Neural Network Predictors from Very Large Datasets. In R.López de Mántaras and L.Saitta (eds.): ECAI2004, Proceedings of the 16th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2004, pp.623-627.