ECAI-2000 Logo

ECAI-2000 Conference Paper

Efficient Asymptotic Approximation in Temporal Difference Learning

Frédérick Garcia, Florent Serre

TD(lambda) is an algorithm that learns the value fonction associated to a policy in a Markov Decision Process (MDP). We propose in this paper an asymptotic approximation of online TD(lambda) with accumulating eligibility trace, called ATD(lambda). We then use the Ordinary Differential Equation (ODE) method to analyse ATD(lambda) and to optimize the choice of the lambda parameter and the learning stepsize, and we introduce ATD, a new efficient temporal difference learning algorithm.

Keywords: TD(lambda), Reinforcement Learning, Machine Learning, Uncertainty in AI

Citation: Frédérick Garcia, Florent Serre: Efficient Asymptotic Approximation in Temporal Difference Learning. In W.Horn (ed.): ECAI2000, Proceedings of the 14th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2000, pp.296-300.

ECAI-2000 is organised by the European Coordinating Committee for Artificial Intelligence (ECCAI) and hosted by the Humboldt University on behalf of Gesellschaft für Informatik.