合理的交通信号控制可以指导交通流的安全、有序运行，是维持复杂系统高效运转的重要手段。随着计算能力的提升和人工智能技术的发展，深度强化学习为城市道路交通信号控制提供了新的思路，通过与环境的实时交互，信号灯智能体可以及时调整自身控制策略以适应动态车流的变化。然而，城市级交通信号控制一方面存在控制路口数量大、交通环境复杂程度高等问题；另一方面单一服务器存在最大计算性能的限制，同时训练过程可能存在数据孤岛现象，因此大规模路网的交通信号控制仍然面临严峻的挑战。本文旨在以多智能体强化学习（Multi-Agent Reinforcement Learning, MARL）为基础，结合多智能体马尔科夫建模、循环神经网络、联邦学习等技术，对数据共享和数据孤岛场景下的控制需求进行分析，并设计合理有效的大规模路网信号控制方案，为智慧城市交通系统的管控提供可靠依据。具体的研究内容如下：
Reasonable traffic signal control (TSC) can guide the safe and orderly operation of traffic flow, which is a crucial means to maintain the efficient operation of complicated traffic systems. With the improvement of computing power and Artificial Intelligence (AI) technology, Deep Reinforcement Learning (DRL) provides a new inspiration for TSC. Through the real-time interaction of environment, the signal agent can timely adjust its control strategy to satisfy the changes of dynamic traffic flow. However, on the one hand, urban large-scale networked TSC faces problems because of the large number of control intersections and high complexity of traffic environment. On the other hand, the maximum computing power of a single server is limited, and isolated data island may exist in the training process. These all make the optimization of TSC a challenging issue. This thesis aims to analyze concrete requirements in the scenarios of data sharing and isolated data island, and then designs reasonable and effective schemes for large-scale networked TSC respectively, based on Multi-Agent Reinforcement Learning (MARL) and combined with Multi-Agent Markov Decision Process, Recurrent Neural Network, and Federated Learning (FL). The main research contents of this thesis are as follows:
(1) The model establishment of large-scale networked TSC scene
In view of the differences between global and local information during modeling of TSC in large-scale road network, the high cost and inability to ensure safety in actual scene experiments, and the unreasonable settings of traffic flow in simulation experiments, this thesis models large-scale networked TSC as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) to realize the abstraction of the optimization problem. Then an effective random generation strategy of ODs is designed for providing reasonable ODs to realize the simulation of actual traffic flow. Finally, the interactive process of algorithm and simulation is designed. The first two comprehensively realize the mathematics of practical problems, and provide the theoretical basis of algorithms and experiments for signal control schemes in different scenarios; while the latter realizes the interaction between algorithm and simulation platform, and provides an overall experimental framework for subsequent research.
(2) Research on TSC based on MARL in data sharing scenario
For the data sharing scenario, a structure-independent cooperative Multi-Agent Reinforcement Learning algorithm is proposed for three key requirements——large-scale requirement, collaboration requirement and universality requirement. This algorithm utilizes the “centralized-training and decentralized-execution” mechanism to realize the balance between large-scale demand and global optimization effect; designs a communication module based on Long-Short Term Memory (LSTM) for transmitting historical information to adjacent intersections and realizing the coordinated control of local network. What's more, considering that the difference of each heterogeneous intersection, dimensions of input and output are inconsistent. Therefore, a structural feature coding module is designed to enable the sharing of global parameters. Experimental results on both simple homogeneous artificial road network and real heterogeneous Zhongguancun Road Network both show that the model proposed can effectively improve the traffic condition of the global road network and shorten travel time. Especially, in heterogeneous actual road network, compared with the “decentralized-training and decentralized-execution” algorithm MA2C, the average waiting time of vehicles in the network controlled by model proposed is reduced by 13.99s and the driving speed is increased by 3.71 m/s. It can also accelerate the convergence rate and improve training stability.
(3) Research on TSC based on Federated Reinforcement Learning in isolated data island scenario
For the isolated data island scenario, a Federated Reinforcement Learning algorithm is proposed to solve the following three problems——the realization of distributed training of the global model, balance of difference and generalization, and secure data transmission. This is a distributed improvement of the MARL method proposed in the data sharing scenario. Firstly, combined with the inspiration of horizontal and vertical Federated Learning, a centralized-distributed collaborative training framework of global model is designed. In this framework, each intersection trains a local global-shared model, and obtains the intermediate information of other intersections through the central server to complete the training of global-shared model, under the constraint that original training data does not flow out. Secondly, in addition to the local global-shared model, each intersection has an additional local private value function network as a non-aggregation part for differences, meanwhile, an adaptive parameter aggregation method is designed to effectively integrate the experience of each intersection for generalization. Thirdly, the differential privacy protection strategy is also used to realize the security protection during the upload and distribution of intermediate results. Experimental results on the actual road network simulation environment show that the model proposed can make rational use of the computing power of the edge intersection and realize the effective control in the isolated data island scenario, though the control effect is slightly inferior to the model proposed in the data sharing scenario. What's more, a direct migration experiment is carried out on the artificial road network with 400 signal lights. The average driving speed in the network controlled by the direct transfer model based on method proposed is increased by 3.84 m/s, compared with traditional Fixed-time Control, which demonstrates certain availability and scalability for super large-scale road network.
To summarize, the large-scale networked TSC algorithms proposed in this thesis under both the data sharing scenario and isolated data island scenario can effectively improve traffic conditions and enhance traffic capacity. At the same time, these models can meet the scalability in super large-scale road network with hundreds and thousands of signal lights, based on the idea of continuous layering and migration retraining after transferring. This can provide a feasible inspiration for the actual sliced administration demands.
|Keyword||交通信号控制 马尔科夫决策过程 多智能体强化学习 联邦强化学习|
|陈筱语. 基于多智能体强化学习的大规模路网交通信号控制[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.|
|Files in This Item:|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.