基于深度强化学习的高铁列车运行调整方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于深度强化学习的高铁列车运行调整方法研究
	王银
	2022-05-17
页数	97
学位类型	硕士
中文摘要	高速铁路在我国现代交通运输体系中占有重要地位，作为新型的可持续发展交通运输方式，节省了居民的出行时间，促进了城市和区域的协调均衡发展，是交通强国建设的重要组成部分。随着高速铁路网的密集建设和人员流动的日益频繁，人们对高铁列车稳定、正点地运行提出了更高的要求。在高铁日常运营过程中，列车通常按照计划时刻表准点行驶，但由于自然或人为等因素所引起的突发事件会对列车运行产生影响，导致列车晚点，影响了高铁的运输能力和列车的运行效率。列车发生延误情况时需要调度员人工调整，调整结果取决于调度员的个人经验和职业素养，工作强度大、自动化程度低，同时还存在着一定的安全隐患。因此，研究高速铁路列车运行调整问题对降低调度员工作强度和提升高速铁路智能调度能力具有重要意义。本文针对突发事件导致的列车晚点问题，开展了基于深度强化学习的高铁列车运行调整策略方法研究，主要内容如下： 1. 针对现阶段缺乏面向强化学习的高铁列车运行仿真实验平台问题，搭建了基于离散事件的高铁列车运行仿真环境。本文通过对突发事件造成的晚点场景分析，首先建立了突发事件下的列车运行调整模型，分析高铁列车行车过程中的时间和空间约束。然后采用离散事件仿真的方法搭建高铁列车运行仿真环境，模拟列车安全运行过程，通过相关实验分析验证了建模的有效性和仿真环境用于列车运行调整研究的可行性。最后给出了仿真环境与强化学习算法的交互框架，搭建了高铁列车运行调整仿真实验平台，为后续的研究提供基础。 2. 针对突发事件导致的列车发车延误和到达延误场景，提出了基于策略梯度强化学习算法的列车运行调整方法。运行图中不同的列车停站方案会带来不同的延误影响，本文从宏观层面逐站调整列车运行图的角度，将列车运行调整问题看成是一个多时空约束的多阶段序贯决策过程，提出了两阶段的马尔科夫决策模型来解决强化学习动作空间爆炸问题。针对该模型构建了有效的状态、动作、奖励函数，给出了基于策略梯度算法的列车运行调整方法。实验结果表明，通过调整运行图中各个车站的发车次序，可以有效减少列车在各车站的总到发晚点时间，提高了求解精度和效率。 3. 针对从宏观层面调整列车运行图时，忽略了列车与列车之间、列车与路网之间的相对位置信息，以及部分约束无法用数学模型表达问题，提出了基于PPO（Proximal Policy Optimization）算法的高铁列车运行调整方法。本文首先从微观角度建立了高速铁路时空资源模型，将列车在车站和运行区间的作业任务视为资源占用和再分配的多阶段序贯决策过程。然后构建了高铁列车运行调整强化学习模型，针对该模型设计了合理的强化学习要素，给出了突发事件下基于PPO 算法的高铁列车调整优化算法。实验结果表明，该方法可以让列车学会利用运行图中冗余的间隙时间来应对晚点情况，减少所有列车的平均总延误时间，实现高铁列车自动运行调整。
英文摘要	The high-speed railway plays a vital role in the Chinese modern transportation system. As a new sustainable transportation pattern, it saves residents’ travel time and promotes the coordinated and balanced development of cities and regions. It is essential to construct a country with a strong transportation network. With the intensive construction of the high-speed railway network and the increasingly frequent flow of people, people have put forward higher requirements for high-speed trains’ stable and punctual operation. Trains usually run on time according to the planned schedule during the daily process. However, due to natural or human factors and other emergencies, the train operation will be affected, resulting in the deviation of the train operation chart, involving the transportation capacity of the high-speed railway and train operation efficiency. When a delay occurs, the dispatcher needs to adjust manually. The result depends on the dispatcher’s personal experience and professional quality. The work intensity is high, the degree of automation is low, and there are certain security risks. Therefore, it is of great significance to study the operation adjustment of high-speed trains to reduce the work intensity of dispatchers and improve the intelligent dispatching ability of high-speed railways. Aiming at the problem of train delay caused by emergencies, this paper conducts research on high-speed train operation adjustment strategies based on deep reinforcement learning. The main contents are as follows: 1. Given the lack of a high-speed train operation simulation experimental platform for reinforcement learning, a high-speed train operation simulation environment based on discrete events has been built. This paper first establishes a high-speed operation adjustment model under emergency events by analysing the delay scenarios caused by emergencies. It explores the time and space constraints in the running process of highspeed trains. Then, the discrete event simulation method is used to build the high-speed train operation environment to imitate the safe operation process of the train. The relevant experiments and analysis verify the validity of the model and the feasibility of the simulation environment for the study of train operation adjustment. Finally, the experimental framework of the interaction between the simulation environment and the reinforcement learning algorithm is given. An experimental simulation platform for high-speed train operation adjustment is built, which provides a basis for subsequent research. 2. Aiming at the scene of train departure and arrival delays caused by emergencies, a train operation adjustment method based on a policy gradient-based reinforcement learning algorithm is proposed. Different train stop schemes in the train operation diagram will bring other delay effects. From the perspective of macroscopically adjusting the train operation diagram station by station, this paper regards the problem of train operation adjustment as a multi-stage sequential decision-making process with multitemporal and spatial constraints. A two-stage Markov decision model is proposed to solve the reinforcement learning action space explosion problem. Based on this model, an effective state, action, and reward function is constructed, and a method for adjusting train operation based on the policy gradient algorithm is given. The experimental results show that by changing the departure sequence of each station in the train diagram, the total arrival and departure delay time of trains at each station can be effectively reduced, and the solution accuracy and efficiency are improved. 3. When macroscopically adjusting the train operation diagram, the relative position information between trains and between trains and the road network is ignored, and mathematical models cannot express some constraints. A high-speed train operation adjustment method is proposed based on the PPO (Proximal Policy Optimization) algorithm. This paper firstly establishes a high-speed railway spatiotemporal resource model from a microscopic perspective. It regards the task of train operations at stations and sections as a multi-stage sequential decision-making process of resource occupation and allocation. Then, a train operation adjustment model for reinforcement learning is established. A reasonable reinforcement learning element is designed based on the model, and a high-speed train adjustment and optimization algorithm based on the PPO algorithm under emergency events is designed. The experimental results show that this method can make the train learn to use the redundant gap time in the train diagram to deal with the delay, reduce all trains’ average total delay time, and realize the automatic operation adjustment of high-speed trains.
关键词	高速铁路智能调度列车运行调整列车运行图强化学习
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48742
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	王银. 基于深度强化学习的高铁列车运行调整方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
王银_基于深度强化学习的高铁列车运行调整（3006KB）	学位论文		限制开放	CC BY-NC-SA