基于多智能体强化学习的城市道路交通信号控制

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 平行智能技术与系统团队

	基于多智能体强化学习的城市道路交通信号控制
	刘皓
	2021-05-26
页数	80
学位类型	硕士
中文摘要	随着社会经济的不断发展，汽车保有量迅速上升，城市道路交通拥堵问题日益突出。交通拥堵不仅增加了人们的出行时间，而且还带来了能源消耗、环境污染等一系列问题。交通拥堵已经成为城市进一步发展的瓶颈，是全世界许多国家亟待解决的一个难题。合理的交通信号灯控制方案能够减少交叉口处的排队长度，有助于缓解交通拥堵。但由于道路交通系统具有非线性、时变性、不确定性等特点，无法利用精确的数学模型对道路交通系统进行建模和优化控制。近年来，越来越多的学者利用强化学习无需被控对象数学模型、只需少量甚至无需先验知识、通用性强的特点来进行交通信号控制。本文面向传统道路环境和车联网环境下的多路口交通信号控制问题，通过考虑多智能体间的通信和信息共享，设计了基于多智能体强化学习的交通信号控制方法，并在所构建的强化学习交通仿真环境中进行了实验验证。主要工作包括以下几个方面： 1. 针对现阶段交通信号控制问题实际场景实验成本高且容易造成不良后果的问题，本文首先面向交通信号控制问题进行了强化学习建模，然后搭建了面向基于强化学习的交通信号控制的仿真实验平台并建立了交通场景库。相关的实验和分析证明了建模的有效性以及使用仿真实验平台将强化学习应用于交通信号控制问题的可行性，为后续传统道路环境下和车联网环境下基于强化学习的交通信号控制问题研究提供了基础。 2. 针对城市大规模路网场景下交通信号控制中多智能体间如何进行信息共享、融合和通信以及如何进行协同控制的问题，本文提出了一种考虑邻近路口信息共享的多路口协同 PPO（Proximal Policy Optimization）控制方法。该方法利用城市道路环境下各交叉口之间的时空关系，融合目标路口周围邻近路口的状态特征和策略特征作为邻近路口间相互传递并共享的消息（Message），实现基于邻近路口信息共享的多路口协同控制。相关的数值仿真实验结果表明，带有通信机制（信息共享）的多智能体强化学习有助于提升路网整体的交通信号控制性能。 3. 针对完全车联网环境下考虑完全观测时可选状态多和状态难以表示的问题，本文提出了一种基于 COPPO（Completely Observable Proximal Policy Optimization）的多路口协同控制方法。该方法通过考虑完全观测的特点，将完全观测下准确的车辆位置和速度信息构成状态矩阵，使用卷积神经网络（Convolutional Neural Network, CNN）来进行特征提取。该方法在仿真场景中进行了验证，相较于部分观测控制具有更好的控制效果。
英文摘要	With the continuous development of society and economy, the number of cars has risen rapidly, and the problem of urban road traffic congestion has become increasingly prominent. Traffic congestion not only increases people’s travel time but also brings a series of problems such as energy consumption and environmental pollution. Traffic congestion has become a bottleneck for the further development of cities, and it is an urgent problem to be solved in many countries around the world. A reasonable traffic signal control plan can reduce the length of queuing at intersections and help alleviate traffic congestion. However, because the road traffic system has the characteristics of non-linearity, time-varying, and uncertainty, it is impossible to use accurate mathematical models to model and optimize the road traffic system. In recent years, more and more scholars have used reinforcement learning to control traffic signals because reinforcement learning has the characteristics of not requiring a mathematical model of the controlled object, requiring little or no prior knowledge, and strong versatility. This thesis is oriented to the problem of multi-intersection traffic signal control in the traditional road environment and the Internet of Vehicles environment. By considering the communication and information sharing between multi-agents, a traffic signal control method based on multi-agent reinforcement learning is designed, and the experimental verification is carried out in the constructed traffic simulation environment. The main work includes the following aspects: 1. In view of the high cost and adverse consequences of the actual experiments of traffic signal control, this thesis firstly carries out reinforcement learning modeling for traffic signal control problems, and then builds a simulation experiment platform for traffic signal control based on reinforcement learning and establishes a library of traffic scenes. Relevant experiments and analyses have proved the effectiveness of modeling and the feasibility of using the simulation experiment platform to apply reinforcement learning to the traffic signal control problem, which provides a foundation for future research on reinforcement learning-based traffic signal control under the traditional road environment and the Internet of vehicles environment. 2. In view of the problem of information sharing, fusion, and communication among multi-agents in traffic signal control in urban large-scale road network scenarios, and how to conduct coordinated control, this thesis proposes a multi-intersection collaborative PPO (Proximal Policy Optimization) control method that considers information sharing of adjacent intersections. This method utilizes the temporal and spatial relationship between intersections in the urban road environment, integrates the state characteristics and strategic characteristics of the neighboring intersections around the target intersection as messages that are transmitted and shared between neighboring intersections, and then realizes multi-intersection cooperative control based on information sharing of neighboring intersections. Relevant numerical simulation experimental results show that multi-agent reinforcement learning with communication mechanism (information sharing) helps to improve the overall traffic signal control performance of the road network. 3. In view of the problem that there are many optional states and the state is difficult to express when the complete observation is considered in the fully connected car environment, this thesis proposes a multi-intersection coordinated control method based on COPPO (Completely Observable Proximal Policy Optimization). This method considers the characteristics of complete observation, composes the state matrix of accurate vehicle position and speed information under complete observation, and uses CNN (Convolutional Neural Network) for feature extraction. This method has been verified in simulation scenarios and has a better control effect than that of partial observation control.
关键词	交通信号控制强化学习多智能体车联网
语种	中文
七大方向——子方向分类	人工智能+交通
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/45040
专题	多模态人工智能系统全国重点实验室_平行智能技术与系统团队
推荐引用方式 GB/T 7714	刘皓. 基于多智能体强化学习的城市道路交通信号控制[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于多智能体强化学习的城市道路交通信号控（4749KB）	学位论文		开放获取	CC BY-NC-SA