基于类脑脉冲神经网络的强化学习算法研究

CASIA OpenIR > 毕业生

	基于类脑脉冲神经网络的强化学习算法研究
	张笃振
	2024-05-15
页数	122
学位类型	博士
中文摘要	借助深度神经网络作为函数估计器，深度强化学习在视频游戏、机器人控制等复杂决策任务中取得了巨大的成功。然而，在认知能力与计算效率等方面，深度强化学习与大脑中高效的奖励学习机制相比仍存在着巨大的差距。相比于具有部分脑启发结构和功能的深度神经网络，脉冲神经网络具有更深厚的生理学基础，更强大的时序信息处理能力、极低的能耗以及出色的鲁棒性等优势特性，受到了研究人员的广泛关注。在类脑计算以及神经形态工程等领域，脉冲神经网络更是被誉为是第三代人工神经网络。通过将脉冲神经网络与强化学习相结合，脉冲强化学习算法能够有效解释生物大脑中的发现，并被认为是发展人工大脑的一个可行途径。作为脑科学和人工智能领域的新兴交叉学科，脉冲强化学习算法的研究目前仍处于起步阶段，在解释大脑学习机制或解决实际应用时存在许多问题。为了进一步促进这一新兴方向的发展，本文重点关注其中的三个子问题：脉冲行动器网络编码的局限性，脉冲行动器网络结构的简化性以及优化方法的生物不合理性。本文的主要研究内容和创新点如下： 1. 基于生物多尺度动力学编码的脉冲强化学习算法本章提出了一个多尺度动力学编码提升的脉冲行动器网络，旨在实现在强化学习任务中的高效决策。该网络将网络尺度上的群体编码与神经元尺度上的动力学神经元编码（包括二阶神经元动力学）相结合，形成强大的时空状态表示。这一多尺度动力学编码方案显著改善了现有脉冲行动器网络编码的局限性，极大地提升了脉冲行动器网络的表达能力。该网络在OpenAI Gym上的多个复杂的机器人连续控制任务中取得了卓越的性能效果，并首次超越了对应的深度行动器网络。 2. 基于生物网络连接模式的脉冲强化学习算法本章提出了一个生物合理的拓扑结构提升的脉冲行动器网络，旨在实现强化学习中的高效决策制定。该网络无缝地整合了具有复杂时空动力学的脉冲神经元和具有生物网络连接模式的拓扑结构。在层间连接方面，该网络模拟了树突树的局部非线性；而在层内连接方面，引入了相邻神经元之间的侧向交互作用。这一创新显著改善了现有脉冲行动器网络结构的简化性，并极大地增强了脉冲行动器网络的信息处理能力。在OpenAI Gym中的多个机器人连续控制任务上的结果表明生物合理的拓扑结构提升的脉冲行动器网络实现了卓越的控制性能，并超越了其对应的深度行动器网络和以往的常规脉冲行动器网络。 3. 基于生物突触连接演化的脉冲强化学习算法本章提出了一个使用遗传演化算法直接搜索脉冲策略网络的方法，从而摆脱了现有基于梯度反向传播的优化框架，这一框架存在一些生物不合理的问题。此外，受到脑科学研究的启发，研究表明大脑通过创建新的突触连接并根据新经验重塑这些连接来形成记忆。我们的方法通过遗传演化算法调整脉冲策略网络中的突触连接，而不是调整突触权重来解决给定任务，改善了现有优化方法的生物不合理性。在多个机器人控制任务上的实验结果表明，我们的方法可以达到与主流深度强化学习方法相同水平的性能，同时不需要巨大的存储负担，并表现出显著更高的能源效率。
英文摘要	With the utilization of deep neural networks as function approximators, deep reinforcement learning has attained considerable success in intricate decision-making tasks like video games and robot control. However, there still exists a substantial disparity between the cognitive abilities and computational efficiency of deep reinforcement learning and the efficient reward-based learning mechanisms observed in the brain. In contrast to deep neural networks, which possess partial brain-inspired structures and functions, spiking neural networks boast a deeper physiological basis, stronger temporal information processing capabilities, extremely low energy consumption, and exceptional robustness. Spiking neural networks have garnered widespread attention, being even hailed as the third generation of artificial neural networks, particularly in fields like brain-inspired computing and neuromorphic engineering. By amalgamating spiking neural networks with reinforcement learning, spiking reinforcement learning algorithms can effectively elucidate discoveries in the brain and are deemed a viable approach to developing artificial brains. As an emerging interdisciplinary field bridging neuroscience and artificial intelligence, research on spiking reinforcement learning algorithms is still in its nascent stages. Various challenges persist in explaining brain learning mechanisms or resolving practical applications. To further advance the development of this emerging direction, this paper focuses on three sub-problems: the limitations of spiking actor network coding, the simplification of spiking actor network structure, and the biological implausibility of optimization methods. The principal research content and innovations of this paper are as follows: 1. Bio-inspired Multi-Scale Dynamic Coding-Based Spiking Reinforcement Learning Algorithm This chapter introduces a multi-scale dynamic coding improved spiking actor network aimed at achieving efficient decision-making in reinforcement learning tasks. By combining population coding at the network scale with dynamic neuron coding at the neuron scale (including second-order neuron dynamics), this network forms a powerful spatial-temporal state representation. This multi-scale dynamic coding scheme significantly improves the limitations of existing spiking actor network coding, greatly enhancing the expressive power of the spiking actor network. This network exhibits outstanding performance in various complex robotic continuous control tasks on OpenAI Gym, surpassing corresponding deep actor networks for the first time. 2. Bio-inspired Network Connectivity Patterns-Based Spiking Reinforcement Learning Algorithm This chapter presents a biologically-plausible topology improved spiking actor network aimed at achieving efficient decision-making in reinforcement learning. The network seamlessly integrates spiking neurons with intricate spatial-temporal dynamics and topologies featuring bio-inspired network connectivity patterns. In terms of inter-layer connections, the network simulates local nonlinearities of dendritic trees, while introducing lateral interactions between adjacent neurons for intra-layer connections. This innovation significantly improves the simplicity of existing spiking actor network structures and greatly enhances the information processing capabilities of the network. Results from multiple continuous robot control tasks in OpenAI Gym demonstrate that the biologically-plausible topology improved spiking actor network achieves outstanding control performance, surpassing both its corresponding deep actor network and conventional spiking actor networks. 3. Bio-inspired Evolution of Synaptic Connections-Based Spiking Reinforcement Learning Algorithm This chapter proposes a method that utilizes genetic evolution algorithms to directly search for spiking policy networks, thereby bypassing the existing optimization frameworks based on gradient backpropagation, which suffer from some biologically implausible issues. Additionally, inspired by neuroscience research indicating that the brain forms memories by creating new synaptic connections and reshaping them based on new experiences, our approach adjusts synaptic connections in the spiking policy network using genetic evolution algorithms instead of adjusting synaptic weights to address given tasks, thus addressing the biological implausibility of existing optimization methods. Experimental results on multiple robot control tasks demonstrate that our method achieves performance comparable to mainstream deep reinforcement learning methods, without the need for significant storage overhead, and exhibits significantly higher energy efficiency.
关键词	类脑智能脉冲神经网络强化学习
学科领域	人工智能
学科门类	工学::控制科学与工程
收录类别	其他
语种	中文
是否为代表性论文	是
七大方向——子方向分类	类脑模型与计算
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/57327
专题	毕业生毕业生_博士学位论文
推荐引用方式 GB/T 7714	张笃振. 基于类脑脉冲神经网络的强化学习算法研究[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
张笃振-博士毕业论文_Final_Ver（23167KB）	学位论文		限制开放	CC BY-NC-SA