基于平行学习的混合智能调控方法与应用研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于平行学习的混合智能调控方法与应用研究
	李小双
	2022-05-19
页数	164
学位类型	博士
中文摘要	随着社会经济的发展，交通、电网等实际系统的复杂程度不断提高，调控难度持续增大，调控需求也越发迫切。而运行机理无法准确建模、状态-动作空间复杂、调控目标多样等特点，也使得传统调控方法在应对实际复杂系统调控问题时，面临越来越大的挑战。人工智能理论方法的飞速进步，为解决复杂系统调控问题提供了新的思路。构建融合机器智能与先验知识经验的混合智能调控方法，对推动复杂系统调控方法的研究、提升复杂系统的管控水平具有重要的理论和现实意义。平行学习是近年来针对复杂系统管理与控制而提出的理论框架，它包含描述学习、预测学习和引导学习三个过程，并借助这三个过程将数据、知识和行动策略融合进一个完整的闭环优化系统中，以解决实际复杂场景中面临的数据不足、策略优化困难等问题。本论文在平行学习理论框架基础上，通过模仿学习、深度强化学习等方法，对融合机器智能与先验知识经验的混合智能调控方法开展了研究，主要工作归纳如下： 1. 针对示教数据的策略建模问题，基于描述学习思路，提出了基于模仿学习的示教数据挖掘方法，实现了对示教数据中的先验知识经验的学习和建模。首先，在真实示教数据下，提出了一种基于掩码的缺失数据编码机制，设计并构建了能够同时提取示教数据时空特征的模仿学习模型，从而实现对真实示教数据的有效建模。进一步地，提出了一种启发式无梯度参数优化方法和虚拟示教数据生成机制，构建了基于虚拟示教数据的模仿学习模型，进而有效集成和整合离线优化方法中的先验知识经验。最后，将上述方法应用于交通信号调控场景，结果表明所提方法能够良好建模示教数据中的先验知识经验，形成可复用的虚拟专家模型，提高模型在交通信号调控任务上的性能。 2. 针对示教数据规模和多样性对上述模仿学习模型的限制，从预测学习的角度出发，设计了通用的先验知识经验增强方法，实现了对少量真实示教数据的扩充和增强。首先，提出了一种融合时空特征与先验知识的数据编解码机制，设计并构建了基于对抗学习和自注意力机制的通用示教数据增强模型，使离散时间序列数据被有效表征和增强。其次，在所提数据增强模型和编解码机制的基础上，提出了混合示教数据集构建方法，从而显著提高了原始示教数据的规模和多样性。最后，利用混合示教数据，训练模仿学习模型，显著改善了模型的学习效果。在交通信号调控场景进行实验，结果表明所提方法能够有效增强先验知识和示教数据，提高深度模仿学习模型对先验知识经验的建模和利用能力。 3. 针对专家策略模型的利用与优化问题，从引导学习的角度出发，提出了一种有监督学习与深度强化学习相融合的辅导型深度强化学习框架，实现了深度强化学习方法对先验知识经验的充分利用，从而不断优化专家策略模型。在前述策略建模过程基础上，分别设计了基于Q值软间隔的有监督专家损失函数和基于优势函数估计的有监督专家损失函数，构建了能够有效利用先验知识经验的深度强化学习模型。此外，提出了示教数据动态更新机制，通过不断微调模仿学习模型，实现有监督学习模型和深度强化学习模型的深度融合。最后，在通用平台和电网电压紧急控制场景中进行实验，结果表明所提方法能够充分利用示教数据，有效优化已有策略，提高模型的学习能力和调控性能。本论文针对复杂系统调控问题，围绕混合智能调控方法与应用开展研究。在平行学习理论指导下，从示教数据的挖掘与建模、先验知识经验的扩充与增强和专家策略的利用与优化三个方面设计混合智能调控模型和方法。在交通信号调控和电网电压调控这两个典型实际复杂场景实验验证所提方法的可行性和有效性，取得了一定的研究成果。本论文期望通过对混合智能调控方法的研究，推动计算机辅助决策在实际复杂场景中的推广和应用，促进复杂系统调控理论与方法的发展。
英文摘要	With the development of society and economy, the complexity of actual complex systems such as urban transportation systems and power grids is increasing, the difficulty of regulation continues to grow, and the demand for regulation becomes more and more urgent. The inability to accurately model the operation mechanism, the complex state-action space, and the diverse regulation targets also make the traditional regulation methods face increasing challenges in dealing with the regulation problems of actual complex systems. The rapid development of artificial intelligence theories and methods has provided new ideas for complex system regulation problems. The construction of hybrid intelligent regulation methods integrating machine intelligence and prior knowledge and experience is of great theoretical and practical significance to promote the research of complex system regulation methods and improve the control level of complex systems. Parallel learning is a theoretical framework that has been proposed in recent years for the management and control of complex systems, which consists of three processes: descriptive learning, predictive learning, and prescriptive learning, and with the help of these three processes, data, knowledge and action policies are integrated into a complete closed-loop optimization system to solve the problems of insufficient data and difficult policy optimization faced in practical complex scenarios. Based on the theoretical framework of parallel learning, this dissertation conducts research on hybrid intelligent regulation methods that integrate machine intelligence and prior knowledge and experience through imitation learning, deep reinforcement learning and other methods. The main work of this dissertation is summarized as follows. 1. Aiming at the policy modeling problem of demonstration data, based on the idea of descriptive learning, a demonstration data mining method based on imitation learning is proposed, which realizes the learning and modeling of the prior knowledge and experience in the demonstration data. Firstly, under the real demonstration data, a mask-based missing data encoding mechanism is proposed, and an imitation learning model that can simultaneously extract the spatio-temporal characteristics of the demonstration data is designed and constructed to achieve effective modeling of the real demonstration data. Further, a heuristic gradient-free parameter optimization method and a virtual demonstration data generation mechanism are proposed, and an imitation learning model based on the virtual demonstration data is constructed, which in turn effectively integrates the prior knowledge and experience in the offline optimization methods. Finally, the above method is applied to the traffic signal regulation scenario, and the experimental results show that the proposed method can model the prior knowledge and experience in the demonstration data well, form a reusable virtual expert model, and improve the model performance on traffic signal regulation tasks. 2. In view of the limitations of the scale and diversity of the demonstration data on the above imitation learning model, a generic prior knowledge and experience enhancement method is designed from the perspective of predictive learning to realize the expansion and enhancement of a small amount of real demonstration data. Firstly, a data encoding and decoding mechanism that fuses spatio-temporal features with prior knowledge is proposed, and a generic demonstration data enhancement model based on adversarial learning and self-attention mechanisms is designed and constructed to enable discrete time-series data to be effectively represented and enhanced. Secondly, using the proposed data enhancement model and encoding-decoding mechanism, a method for constructing hybrid demonstration data set is proposed. Therefore, the scale and diversity of the original demonstration data can be significantly improved. Finally, the imitation learning model is trained with the hybrid demonstration data, and the learning effect of the model is significantly improved. Experiments are conducted in a traffic signal regulation scenario, and the results show that the proposed method can effectively enhance the prior knowledge and demonstration data, and improve the ability of the deep imitation learning model to model and utilize the prior knowledge and experience. 3. Aiming at the utilization and optimization of the expert policy model, from the perspective of prescriptive learning, a tutorial deep reinforcement learning framework that integrates supervised learning method and deep reinforcement learning method is proposed, so as to realize the full utilization of prior knowledge and experience data by deep reinforcement learning methods, and continuously optimize the expert policy model. Based on the aforementioned policy modeling process, soft Q-value interval-based supervised expert loss function and advantage function estimation-based supervised expert loss function are designed respectively to construct tutorial deep reinforcement learning models that can effectively utilize prior knowledge and experience. In addition, a dynamic update mechanism of the demonstration data is presented, which realizes the deep integration of the supervised learning model and deep reinforcement learning model by continuously fine-tuning the imitation learning model. Finally, experiments are conducted in a general-purpose platform and a grid voltage emergency control scenario, and the results show that the proposed methods can make full use of the demonstration data, effectively optimize the existing policies, and improve the learning ability and regulation performance of the model. For the problems of complex system regulation, this dissertation focuses on hybrid intelligent regulation methods and applications. Guided by parallel learning theory, the hybrid intelligent regulation models and methods are designed from three aspects: mining and modeling of demonstration data, expansion and enhancement of prior knowledge and experience, and expert policy utilization and optimization. The feasibility and effectiveness of the proposed methods are experimentally verified in two typical complex scenarios of traffic signal regulation and power grid voltage regulation, and certain research results are obtained. This dissertation expects to promote the popularization and application of computer-aided decision-making in actual complex scenarios and to promote the development of complex system regulation theory and methods through the study of hybrid intelligent regulation methods.
关键词	平行学习混合智能调控示教数据模仿学习深度强化学习
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48846
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李小双. 基于平行学习的混合智能调控方法与应用研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
李小双_博士论文终稿.pdf（10128KB）	学位论文		限制开放	CC BY-NC-SA