With the development of national economy and the speeding up of urbanization process, the number of motor vehicles and traffic volume in China grows rapidly, making the urban traffic becoming much more congested. Research shows that traffic signal control for intersections plays an important role in urban transportation systems, which is taken as the major focus of this study. This paper proposes advanced reinforcement learning based traffic signal control approaches for a single intersection and an artery. Firstly, for traffic light control in a single intersection, we propose a normalized reward function and apply reinforcement learning to design the controller, which turns out better than pre-timed control. Secondly, we conduct extensive and systematic experiments for different aspects of traffic signal control in reinforcement learning. We compare the performance of the proposed method with the traditional pre-timed control. We also analyze the convergence of the algorithm and its influence by the reward function and the state presentation. Thirdly, we propose clique-based sparse reinforcement learning using factor graphs. This is to solve the problem of coordination of multiple agents. The clique-based decomposition is proposed as a method for assigning reward among agents, aiming to promote coordination. Then, we obtain the general max-plus algorithm from the sum-product algorithm and integrate it with sparse reinforcement learning to solve the coordination problem in a parallel and distributed way. Fourthly, the proposed multiagent reinforcement learning algorithm is validated on a benchmark problem – the sensor network. It is compared with six other multiagent reinforcement learning algorithms and the single agent reinforcement learning algorithm. The results show that the proposed method gains the best performance and the highest learning speed. Fifthly, the proposed method is used to solve the coordination problem of traffic signal control in an artery road. To alleviate the dimensional disaster problem, different kinds of agents are designed for specific learning tasks. Moreover, we propose a reward function which can evaluate the behavior of the coordination. Experimental results show that the proposed method has a great advantage over pre-timed control. Finally, the obtained results are summarized and future work is addressed.
修改评论