ECAI 2004 Conference Paper

[PDF] [full paper] [prev] [tofc] [next]

Controlling on-line search for solving Markov Decision Processes via sampling

Laurent Peret, Frederick Garcia

Markov Decision Processes (MDPs) have become a standard these last ten years for solving problems of sequential decision under uncertainty. The usual request in this framework is the computation of an optimal policy that defines the optimal action for each state of the system. For complex MDPs, exact computation of optimal policies is often infeasible. Several approaches have been developed by the Reinforcement Learning community to compute near optimal policies for complex MDPs by means of function approximation and simulation. In this paper, we investigate the problem of refining near optimal policies via online search techniques, tackling the local problem of finding an optimal action for a single current state of the system. Kearns, Mansour & Ng (Machine Learning 2002) consider such an on-line approach based on sampling: at each step, a randomly sampled look-ahead tree is developed to compute the optimal action for the current state. We propose here some control search strategies for constructing such trees. Their purpose is to provide good "anytime" profiles : they quickly select a good action with a high probability and then smoothly increase the probability of selecting the optimal action.

Keywords: Markov decision processes, reinforcement learning, sampling, heuristic search

Citation: Laurent Peret, Frederick Garcia: Controlling on-line search for solving Markov Decision Processes via sampling. In R.López de Mántaras and L.Saitta (eds.): ECAI2004, Proceedings of the 16th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2004, pp.530-534.

[prev] [tofc] [next]

ECAI-2004 is organised by the European Coordinating Committee for Artificial Intelligence (ECCAI) and hosted by the Universitat Politècnica de València on behalf of Asociación Española de Inteligencia Artificial (AEPIA) and Associació Catalana d'Intel-ligència Artificial (ACIA).