CASIA OpenIR  > 毕业生  > 硕士学位论文
基于多智能体强化学习的大规模路网交通信号控制
陈筱语
2022-05-17
Pages100
Subtype硕士
Abstract

合理的交通信号控制可以指导交通流的安全、有序运行,是维持复杂系统高效运转的重要手段。随着计算能力的提升和人工智能技术的发展,深度强化学习为城市道路交通信号控制提供了新的思路,通过与环境的实时交互,信号灯智能体可以及时调整自身控制策略以适应动态车流的变化。然而,城市级交通信号控制一方面存在控制路口数量大、交通环境复杂程度高等问题;另一方面单一服务器存在最大计算性能的限制,同时训练过程可能存在数据孤岛现象,因此大规模路网的交通信号控制仍然面临严峻的挑战。本文旨在以多智能体强化学习(Multi-Agent Reinforcement Learning, MARL)为基础,结合多智能体马尔科夫建模、循环神经网络、联邦学习等技术,对数据共享和数据孤岛场景下的控制需求进行分析,并设计合理有效的大规模路网信号控制方案,为智慧城市交通系统的管控提供可靠依据。具体的研究内容如下:

(1)大规模路网交通信号控制的场景建模

针对大规模路网交通信号控制在建模时全局路网和局部路口信息存在差异性、实际场景实验成本高且无法保证安全性、仿真实验中车流设置不合理等问题,本文首先使用分布式-局部观测马尔科夫建模将控制优化问题抽象化,然后基于实际车流特征提出一种合理的随机OD需求生成策略将实际车流仿真化,最后设计了算法、仿真的交互过程。前两者综合实现了实际问题的数学化,为提出不同场景下的信号控制方案提供了算法和实验的理论基础;后者实现了算法和仿真的交互化,为后续研究提供了整体的实验框架。

(2)数据共享场景下基于多智能体强化学习的信号控制研究

针对数据共享场景下的受控路口数量大规模性、协调性和通用性需求,本文提出了一种结构无关的协同多智能体强化学习算法。该算法以“集中训练-分散执行”机制的多智能体强化学习为框架,平衡大规模性受控路口数量和全局优化效果;设计了一种基于长短时记忆网络的通信模块用于传递历史信息,实现局部路网的协调控制;同时考虑到各异质路口差异性导致部分网络的输入输出维度不一致,设计了一种结构特征编码模块,使得异质路口能够实现全局模型的参数共享。在简单同质人工路网和真实异质中关村路网上的实验结果表明,所提模型可以有效改善全局路网的交通状况、缩短出行时间,尤其在异质路网上,相比“分散训练-分散执行”算法MA2C,本模型控制路网内车辆的平均等待时间减少了13.99秒、行驶速度增加了3.71米/秒,同时能够加快收敛速率、提升训练稳定性。

(3) 数据孤岛场景下基于联邦强化学习的信号控制研究

针对数据孤岛场景下全局模型的分布式训练、差异性和泛化性的平衡、及数据的安全传输这三个需求,本文提出了一种联邦强化学习算法,这是对数据共享场景下所提多智能体强化学习算法的分布式改进。该算法结合横向和纵向联邦思想,提出集中-分布式协同训练框架,由各路口训练一个本地全局模型,并通过中央服务器获取其他路口的中间信息以完成全局模型训练,保证原始训练数据不外流;除本地全局模型外,每个路口拥有额外的私有局部值函数网络作为非聚合部分,以保留本地模型差异性,同时设计了一种自适应参数聚合方式实现各路口经验的有效融合以增强模型泛化性;此外还基于差分隐私保护策略保证中间结果在上传和分发时的数据安全。在真实路网抽象图上的实验结果表明,该算法的控制效果虽然略逊于数据共享场景下提出的模型,但可以实现数据孤岛场景下的有效控制,减轻中央服务器集中训练的压力;在具有400个信号灯的人工路网上进行了迁移测试,与传统定时方法相比,直接迁移的联邦强化学习模型控制路网中车辆的平均行驶速度提高了3.84米/秒,表明论文所提算法对大规模路网具有一定的可用性和可扩展性。

综上所述,本文在数据共享场景和数据孤岛场景下提出的大规模路网交通信号控制算法能够有效改善路网交通状况、提升道路实际通行能力。同时,这两个算法可以基于持续分层和迁移再训练思想,实现在具有数百数千个信号灯的大规模路网上的可扩展性需求,为实际的分片行政管理提供可行的思路。

Other Abstract

Reasonable traffic signal control (TSC) can guide the safe and orderly operation of traffic flow, which is a crucial means to maintain the efficient operation of complicated traffic systems. With the improvement of computing power and Artificial Intelligence (AI) technology, Deep Reinforcement Learning (DRL) provides a new inspiration for TSC. Through the real-time interaction of environment, the signal agent can timely adjust its control strategy to satisfy the changes of dynamic traffic flow. However, on the one hand, urban large-scale networked TSC faces problems because of the large number of control intersections and high complexity of traffic environment. On the other hand, the maximum computing power of a single server is limited, and isolated data island may exist in the training process. These all make the optimization of TSC a challenging issue. This thesis aims to analyze concrete requirements in the scenarios of data sharing and isolated data island, and then designs reasonable and effective schemes for large-scale networked TSC respectively, based on Multi-Agent Reinforcement Learning (MARL) and combined with Multi-Agent Markov Decision Process, Recurrent Neural Network, and Federated Learning (FL). The main research contents of this thesis are as follows:

(1) The model establishment of large-scale networked TSC scene

In view of the differences between global and local information during modeling of TSC in large-scale road network, the high cost and inability to ensure safety in actual scene experiments, and the unreasonable settings of traffic flow in simulation experiments, this thesis models large-scale networked TSC as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) to realize the abstraction of the optimization problem. Then an effective random generation strategy of ODs is designed for providing reasonable ODs to realize the simulation of actual traffic flow. Finally, the interactive process of algorithm and simulation is designed. The first two comprehensively realize the mathematics of practical problems, and provide the theoretical basis of algorithms and experiments for signal control schemes in different scenarios; while the latter realizes the interaction between algorithm and simulation platform, and provides an overall experimental framework for subsequent research.

(2) Research on TSC based on MARL in data sharing scenario

For the data sharing scenario, a structure-independent cooperative Multi-Agent Reinforcement Learning algorithm is proposed for three key requirements——large-scale requirement, collaboration requirement and universality requirement. This algorithm utilizes the “centralized-training and decentralized-execution” mechanism to realize the balance between large-scale demand and global optimization effect; designs a communication module based on Long-Short Term Memory (LSTM) for transmitting historical information to adjacent intersections and realizing the coordinated control of local network. What's more, considering that the difference of each heterogeneous intersection, dimensions of input and output are inconsistent. Therefore, a structural feature coding module is designed to enable the sharing of global parameters. Experimental results on both simple homogeneous artificial road network and real heterogeneous Zhongguancun Road Network both show that the model proposed can effectively improve the traffic condition of the global road network and shorten travel time. Especially, in heterogeneous actual road network, compared with the “decentralized-training and decentralized-execution” algorithm MA2C, the average waiting time of vehicles in the network controlled by model proposed is reduced by 13.99s and the driving speed is increased by 3.71 m/s. It can also accelerate the convergence rate and improve training stability.

(3) Research on TSC based on Federated Reinforcement Learning in isolated data island scenario

For the isolated data island scenario, a Federated Reinforcement Learning algorithm is proposed to solve the following three problems——the realization of distributed training of the global model, balance of difference and generalization, and secure data transmission. This is a distributed improvement of the MARL method proposed in the data sharing scenario. Firstly, combined with the inspiration of horizontal and vertical Federated Learning, a centralized-distributed collaborative training framework of global model is designed. In this framework, each intersection trains a local global-shared model, and obtains the intermediate information of other intersections through the central server to complete the training of global-shared model, under the constraint that original training data does not flow out. Secondly, in addition to the local global-shared model, each intersection has an additional local private value function network as a non-aggregation part for differences, meanwhile, an adaptive parameter aggregation method is designed to effectively integrate the experience of each intersection for generalization. Thirdly, the differential privacy protection strategy is also used to realize the security protection during the upload and distribution of intermediate results. Experimental results on the actual road network simulation environment show that the model proposed can make rational use of the computing power of the edge intersection and realize the effective control in the isolated data island scenario, though the control effect is slightly inferior to the model proposed in the data sharing scenario. What's more, a direct migration experiment is carried out on the artificial road network with 400 signal lights. The average driving speed in the network controlled by the direct transfer model based on method proposed is increased by 3.84 m/s, compared with traditional Fixed-time Control, which demonstrates certain availability and scalability for super large-scale road network.

To summarize, the large-scale networked TSC algorithms proposed in this thesis under both the data sharing scenario and isolated data island scenario can effectively improve traffic conditions and enhance traffic capacity. At the same time, these models can meet the scalability in super large-scale road network with hundreds and thousands of signal lights, based on the idea of continuous layering and migration retraining after transferring. This can provide a feasible inspiration for the actual sliced administration demands.

Keyword交通信号控制 马尔科夫决策过程 多智能体强化学习 联邦强化学习
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/48758
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
陈筱语. 基于多智能体强化学习的大规模路网交通信号控制[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.
Files in This Item:
File Name/Size DocType Version Access License
12_陈筱语_毕业论文_带签字.pdf(23947KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[陈筱语]'s Articles
Baidu academic
Similar articles in Baidu academic
[陈筱语]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[陈筱语]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.