Reinforcement Learning (RL) is a powerful machine learning methodology and has proven effective in a variety of domains. Research into RL addresses a very general problem: how to learn good policies in an arbitrary unknown environment. Its promise is beguiling, the only thing you need to do is to tell the agent what is its goal, then the agent will try to reach the goal by making use of the reward and punish signals from the environment, and through trial and error interactions with the environment. So RL is attracting more and more attentions in recent years. We first introduce the basic theories and classic algorithms of RL, discuss the differences and relations between the algorithms. The problem exits for all of these classic algorithms is that they usually assume discrete state spaces and action spaces, but in fact most problems’ state spaces are continuous, which limits the practical use RL largely. To handle this problem, function approximator is commonly used to represent the value function instead of a look up table. Unfortunately, this method will typically weaken the convergence guarantees, probably make the algorithm divergence, and sometime even fail at the beginning of the learning. Kernel based Reinforcement Learning, which is characterized by instance based value function estimation and kernel based generalization, is a new method to cope with this problem. It converges to the optimal policy with theoretical guarantees as the number of instances increase, which means this method is consistent in the statistical sense,but there are few kernel based RL algorithms at present, in this paper, we propose three different versions of kernel based Monta Carlo algorithm (KBMC), which combines random search, traditional Monta Carlo method and kernel based Reinforcement Learning together, they can handle RL problems with continuous state spaces and discrete action spaces efficiently. We verify the algorithms on the mountain car problem with continuous state space and discrete action space, compare to classic algorithms Sarsa (λ), Q (λ) and Actor-Critic (λ), KBMC algorithms can converge to better polices.
修改评论