基于深度强化学习的网约车调度算法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于深度强化学习的网约车调度算法研究
	习金浩
	2023-05
页数	84
学位类型	硕士
中文摘要	网约车系统是城市交通系统的重要组成部分，高效的网约车系统可以显著提升人们的出行质量，提高城市交通效率。然而受到人们动态变化的出行需求的影响，城市内常出现较为严重的车辆供需不平衡现象，这给人们的出行带来了不便。本文主要针对这一问题，以深度强化学习、分层强化学习、图神经网络等为理论基础，开展网约车调度算法研究。本文主要研究内容如下：（1）设计了一个通用的网约车调度仿真环境。利用真实的成都市网约车订单数据、成都市路网结构数据、成都市交通拥堵数据，设计了一个细粒度的通用网约车调度仿真环境，包含栅格地图和路网地图两种模式，可适应于不同的地理信息抽象方式。此仿真环境提供不同的数据集用于训练和测试，并且为不同的调度算法提供了统一的调度接口，为后续网约车调度算法的研究提供了支撑。（2）在栅格地图环境下，设计了一种基于分层强化学习的网约车调度算法。首先设计了一个分层强化学习框架，将全天全局网约车调度问题分为不同层次的任务，对各层级的任务分别设计不同的强化学习算法进行决策。其次，设计了一种并行协调机制，每个协调器包含多个不同结构的执行器子策略，对网约车进行分散式调度，各执行器采用基于Q值的概率策略对动作进行采样，提升多车协同能力。最后，对智能体设计了一种混合状态，包含丰富的时空供需分布数据，提高对邻近车辆的区分能力。多组对比实验表明相比于其他调度方法，此方法的订单响应率、总成交量、综合指标等评价指标均有明显改善，七组消融实验证明了此方法各创新点的有效性。（3）在路网地图环境下，设计了一种基于对抗式分层图强化学习的网约车调度算法。此环境保留了复杂的城市道路图结构，并考虑了动态交通拥堵情况，使得调度问题更为复杂。针对这些问题，首先设计了一个分层图强化学习框架，根据静态路网结构和动态拥堵信息对道路节点进行动态图聚类，并将复杂系统中的全天多车协同调度任务分解为不同层级的决策任务。其次，对于执行器子策略，设计了一种对抗图强化学习算法。预测分支和调度分支以对抗的方式进行协同训练，在图结构问题下准确设计智能体的状态和奖励，实现较好的供需预测和车辆调度。最后，在调度分支中采用了离散Soft Actor-Critic算法，学习得到同一状态下的多个最优动作，实现多车协同。多组对比实验和消融实验验证了此方法的整体有效性和各创新点的有效性。综上所述，本文设计了一个通用网约车调度仿真环境和两种高效的网约车调度算法。与现有方法相比，本文所设计的模型对复杂动态系统的适应能力更强，能更准确地表达智能体状态，且具有更好的多车协同能力。算法具有一定的理论创新性并且提升了网约车系统的运行效率，具有重要理论意义和应用价值。
英文摘要	The Mobility-on-Demand (MOD) system plays a crucial role in the urban transportation system, as it can significantly enhance people's travel experience and improve the efficiency of urban traffic. However, due to the dynamic changes in people's mobility demands, the imbalance of vehicles' supply and demand within the MOD system is a serious issue. It brings inconvenience to people's travel. To address this problem, this thesis conducts research on vehicle repositioning based on deep reinforcement learning, hierarchical reinforcement learning, and graph neural network. The main contributions of this thesis are as follows： (1)A realistic and fine-grained universal MOD simulation environment is designed. To create this environment, we leverage real order data, road network structure data, and traffic congestion data from Chengdu. The environment consists of the Grid-Map and the Graph-Map, which can be tailored to different geographic information abstraction methods. It provides various data sets for training and testing and offers a unified repositioning interface for different vehicle repositioning algorithms. As a result, this environment serves as a support system for further research on vehicle repositioning algorithms. (2)In the Grid-Map environment, a vehicle repositioning algorithm based on hierarchical reinforcement learning is proposed. Firstly, a hierarchical reinforcement learning framework is introduced. Multiple reinforcement learning algorithms are designed to accomplish vehicle repositioning tasks at different levels. Furthermore, a parallel coordination mechanism is designed. Each coordinator contains multiple actuators with different structures and the vehicles are repositioned in a decentralized way. Each actuator utilizes the probabilistic policy based on Q values to sample actions, thereby improving the ability of multi-vehicle coordination. Finally, a mixed state that contains rich spatial and temporal information on supply and demand is introduced to enhance the ability to distinguish adjacent vehicles. Experimental results show that the proposed algorithm outperforms other repositioning methods, as demonstrated by improvements in evaluation indicators such as Order Response Rate, Gross Merchandise Volume, and Comprehensive Indicator. Ablation experiments validate the effectiveness of each innovation point of the proposed algorithm. (3)In the Graph-Map environment, an adversarial network enhanced hierarchical graph reinforcement learning algorithm is proposed. Compared to the Grid Map environment, the Graph-Map environment presents additional challenges such as dynamic traffic congestion and a complex road network structure. To address these challenges, firstly, a hierarchical graph reinforcement learning framework is designed, which conducts dynamic clustering on road nodes and divides the complex multi-vehicle repositioning task into decision problems at different levels. And then we design an adversarial graph reinforcement learning algorithm for the actuator. The prediction branch and the repositioning branch cooperate in an adversarial way to design accurate states and rewards for agents in the graph-structure problem, thereby achieving excellent forecasting and repositioning effects. Finally, to enable efficient multi-vehicle coordination, we utilize a discrete Soft Actor-Critic algorithm in the repositioning branch. It learns multiple optimal actions for vehicles of the same state. Multiple comparative experiments and ablation experiments demonstrate the effectiveness of our method. In conclusion, this thesis proposes a universal MOD simulation environment and two efficient vehicle repositioning algorithms. Compared with the existing methods, they have stronger adaptability to complex dynamic systems, can express the state of agents more accurately, and have better multi-vehicle cooperative repositioning capabilities. This research exhibits theoretical innovation and significantly enhances the operational efficiency of the MOD system, thereby holding substantial theoretical significance and practical value.
关键词	Vehicle Repositioning Deep Reinforcement Learning Hierarchical Reinforcement Learning Graph Neural Network
学科门类	工学::控制科学与工程
语种	中文
是否为代表性论文	是
七大方向——子方向分类	人工智能+交通
国重实验室规划方向分类	多智能体决策
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51933
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	习金浩. 基于深度强化学习的网约车调度算法研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
习金浩_学位论文基于深度强化学习的网约（15643KB）	学位论文		限制开放	CC BY-NC-SA