|Thesis Advisor||王飞跃 ; 王晓|
|Place of Conferral||中国科学院自动化研究所|
|Keyword||平行学习 混合智能调控 示教数据 模仿学习 深度强化学习|
With the development of society and economy, the complexity of actual complex systems such as urban transportation systems and power grids is increasing, the difficulty of regulation continues to grow, and the demand for regulation becomes more and more urgent. The inability to accurately model the operation mechanism, the complex state-action space, and the diverse regulation targets also make the traditional regulation methods face increasing challenges in dealing with the regulation problems of actual complex systems. The rapid development of artificial intelligence theories and methods has provided new ideas for complex system regulation problems. The construction of hybrid intelligent regulation methods integrating machine intelligence and prior knowledge and experience is of great theoretical and practical significance to promote the research of complex system regulation methods and improve the control level of complex systems.
Parallel learning is a theoretical framework that has been proposed in recent years for the management and control of complex systems, which consists of three processes: descriptive learning, predictive learning, and prescriptive learning, and with the help of these three processes, data, knowledge and action policies are integrated into a complete closed-loop optimization system to solve the problems of insufficient data and difficult policy optimization faced in practical complex scenarios. Based on the theoretical framework of parallel learning, this dissertation conducts research on hybrid intelligent regulation methods that integrate machine intelligence and prior knowledge and experience through imitation learning, deep reinforcement learning and other methods. The main work of this dissertation is summarized as follows.
1. Aiming at the policy modeling problem of demonstration data, based on the idea of descriptive learning, a demonstration data mining method based on imitation learning is proposed, which realizes the learning and modeling of the prior knowledge and experience in the demonstration data. Firstly, under the real demonstration data, a mask-based missing data encoding mechanism is proposed, and an imitation learning model that can simultaneously extract the spatio-temporal characteristics of the demonstration data is designed and constructed to achieve effective modeling of the real demonstration data. Further, a heuristic gradient-free parameter optimization method and a virtual demonstration data generation mechanism are proposed, and an imitation learning model based on the virtual demonstration data is constructed, which in turn effectively integrates the prior knowledge and experience in the offline optimization methods. Finally, the above method is applied to the traffic signal regulation scenario, and the experimental results show that the proposed method can model the prior knowledge and experience in the demonstration data well, form a reusable virtual expert model, and improve the model performance on traffic signal regulation tasks.
2. In view of the limitations of the scale and diversity of the demonstration data on the above imitation learning model, a generic prior knowledge and experience enhancement method is designed from the perspective of predictive learning to realize the expansion and enhancement of a small amount of real demonstration data. Firstly, a data encoding and decoding mechanism that fuses spatio-temporal features with prior knowledge is proposed, and a generic demonstration data enhancement model based on adversarial learning and self-attention mechanisms is designed and constructed to enable discrete time-series data to be effectively represented and enhanced. Secondly, using the proposed data enhancement model and encoding-decoding mechanism, a method for constructing hybrid demonstration data set is proposed. Therefore, the scale and diversity of the original demonstration data can be significantly improved. Finally, the imitation learning model is trained with the hybrid demonstration data, and the learning effect of the model is significantly improved. Experiments are conducted in a traffic signal regulation scenario, and the results show that the proposed method can effectively enhance the prior knowledge and demonstration data, and improve the ability of the deep imitation learning model to model and utilize the prior knowledge and experience.
3. Aiming at the utilization and optimization of the expert policy model, from the perspective of prescriptive learning, a tutorial deep reinforcement learning framework that integrates supervised learning method and deep reinforcement learning method is proposed, so as to realize the full utilization of prior knowledge and experience data by deep reinforcement learning methods, and continuously optimize the expert policy model. Based on the aforementioned policy modeling process, soft Q-value interval-based supervised expert loss function and advantage function estimation-based supervised expert loss function are designed respectively to construct tutorial deep reinforcement learning models that can effectively utilize prior knowledge and experience. In addition, a dynamic update mechanism of the demonstration data is presented, which realizes the deep integration of the supervised learning model and deep reinforcement learning model by continuously fine-tuning the imitation learning model. Finally, experiments are conducted in a general-purpose platform and a grid voltage emergency control scenario, and the results show that the proposed methods can make full use of the demonstration data, effectively optimize the existing policies, and improve the learning ability and regulation performance of the model.
For the problems of complex system regulation, this dissertation focuses on hybrid intelligent regulation methods and applications. Guided by parallel learning theory, the hybrid intelligent regulation models and methods are designed from three aspects: mining and modeling of demonstration data, expansion and enhancement of prior knowledge and experience, and expert policy utilization and optimization. The feasibility and effectiveness of the proposed methods are experimentally verified in two typical complex scenarios of traffic signal regulation and power grid voltage regulation, and certain research results are obtained. This dissertation expects to promote the popularization and application of computer-aided decision-making in actual complex scenarios and to promote the development of complex system regulation theory and methods through the study of hybrid intelligent regulation methods.
|李小双. 基于平行学习的混合智能调控方法与应用研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.|
|Files in This Item:|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.