[full paper] |
Frédérick Garcia, Florent Serre
TD(lambda) is an algorithm that learns the value fonction associated to a policy in a Markov Decision Process (MDP). We propose in this paper an asymptotic approximation of online TD(lambda) with accumulating eligibility trace, called ATD(lambda). We then use the Ordinary Differential Equation (ODE) method to analyse ATD(lambda) and to optimize the choice of the lambda parameter and the learning stepsize, and we introduce ATD, a new efficient temporal difference learning algorithm.
Keywords: TD(lambda), Reinforcement Learning, Machine Learning, Uncertainty in AI
Citation: Frédérick Garcia, Florent Serre: Efficient Asymptotic Approximation in Temporal Difference Learning. In W.Horn (ed.): ECAI2000, Proceedings of the 14th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2000, pp.296-300.