One of the important goals in Artificial Intelligence is to build agents that can learn behaviors automatically through interaction with the environment. Due to its inherent characteristic, supervised learning, the kind of learning studied in most current research, is not adequate for achieving that goal, while reinforcement learning is very fit for building such agents. For this reason, nowadays, reinforcement learning is subject to more and more attention. Reinforcement learning is defined by characterizing a learning problem faced by agents that learn behavior through trial-and-error interactions with a dynamic environment. In this article, the basic theory and algorithms of RL are introduced. These algorithms are based on value function, and typically developed for lookup tables. With the increasing of the number of states, these algorithms are exposed to the curse of dimensionality. Therefore, lookup tables are replaced by fumction approximators. New families of algorithms are derived based on stochastic gradient descent to adjust the parameters in the fumction approximators. In addition, because of the fundamental limitation in all value-function-based methods, direct gradient-based reinforcement learning is introduced. Additionally, simulations are presented that show the applications of reinforcement learning methods in the path planning problem. In the simulations, various methods and techniques are combined. The results show that with reinforcement learning methods, agents not only can find the optimal path even in the dynamic environment, but also can coordinate or compete with each other.
修改评论