CASIA OpenIR  > 毕业生
基于类脑脉冲神经网络的强化学习算法研究
张笃振
2024-05-15
Pages122
Subtype博士
Abstract

借助深度神经网络作为函数估计器,深度强化学习在视频游戏、机器人控制等复杂决策任务中取得了巨大的成功。然而,在认知能力与计算效率等方面,深度强化学习与大脑中高效的奖励学习机制相比仍存在着巨大的差距。相比于具有部分脑启发结构和功能的深度神经网络,脉冲神经网络具有更深厚的生理学基础,更强大的时序信息处理能力、极低的能耗以及出色的鲁棒性等优势特性,受到了研究人员的广泛关注。在类脑计算以及神经形态工程等领域,脉冲神经网络更是被誉为是第三代人工神经网络。通过将脉冲神经网络与强化学习相结合,脉冲强化学习算法能够有效解释生物大脑中的发现,并被认为是发展人工大脑的一个可行途径。作为脑科学和人工智能领域的新兴交叉学科,脉冲强化学习算法的研究目前仍处于起步阶段,在解释大脑学习机制或解决实际应用时存在许多问题。为了进一步促进这一新兴方向的发展,本文重点关注其中的三个子问题:脉冲行动器网络编码的局限性,脉冲行动器网络结构的简化性以及优化方法的生物不合理性。

本文的主要研究内容和创新点如下:

1. 基于生物多尺度动力学编码的脉冲强化学习算法

本章提出了一个多尺度动力学编码提升的脉冲行动器网络,旨在实现在强化学习任务中的高效决策。该网络将网络尺度上的群体编码与神经元尺度上的动力学神经元编码(包括二阶神经元动力学)相结合,形成强大的时空状态表示。这一多尺度动力学编码方案显著改善了现有脉冲行动器网络编码的局限性,极大地提升了脉冲行动器网络的表达能力。该网络在OpenAI Gym上的多个复杂的机器人连续控制任务中取得了卓越的性能效果,并首次超越了对应的深度行动器网络。

2. 基于生物网络连接模式的脉冲强化学习算法

本章提出了一个生物合理的拓扑结构提升的脉冲行动器网络,旨在实现强化学习中的高效决策制定。该网络无缝地整合了具有复杂时空动力学的脉冲神经元和具有生物网络连接模式的拓扑结构。在层间连接方面,该网络模拟了树突树的局部非线性;而在层内连接方面,引入了相邻神经元之间的侧向交互作用。这一创新显著改善了现有脉冲行动器网络结构的简化性,并极大地增强了脉冲行动器网络的信息处理能力。在OpenAI Gym中的多个机器人连续控制任务上的结果表明生物合理的拓扑结构提升的脉冲行动器网络实现了卓越的控制性能,并超越了其对应的深度行动器网络和以往的常规脉冲行动器网络。

3. 基于生物突触连接演化的脉冲强化学习算法

本章提出了一个使用遗传演化算法直接搜索脉冲策略网络的方法,从而摆脱了现有基于梯度反向传播的优化框架,这一框架存在一些生物不合理的问题。此外,受到脑科学研究的启发,研究表明大脑通过创建新的突触连接并根据新经验重塑这些连接来形成记忆。我们的方法通过遗传演化算法调整脉冲策略网络中的突触连接,而不是调整突触权重来解决给定任务,改善了现有优化方法的生物不合理性。在多个机器人控制任务上的实验结果表明,我们的方法可以达到与主流深度强化学习方法相同水平的性能,同时不需要巨大的存储负担,并表现出显著更高的能源效率。

 

Other Abstract

With the utilization of deep neural networks as function approximators, deep reinforcement learning has attained considerable success in intricate decision-making tasks like video games and robot control. However, there still exists a substantial disparity between the cognitive abilities and computational efficiency of deep reinforcement learning and the efficient reward-based learning mechanisms observed in the brain. In contrast to deep neural networks, which possess partial brain-inspired structures and functions, spiking neural networks boast a deeper physiological basis, stronger temporal information processing capabilities, extremely low energy consumption, and exceptional robustness. Spiking neural networks have garnered widespread attention, being even hailed as the third generation of artificial neural networks, particularly in fields like brain-inspired computing and neuromorphic engineering. By amalgamating spiking neural networks with reinforcement learning, spiking reinforcement learning algorithms can effectively elucidate discoveries in the brain and are deemed a viable approach to developing artificial brains. As an emerging interdisciplinary field bridging neuroscience and artificial intelligence, research on spiking reinforcement learning algorithms is still in its nascent stages. Various challenges persist in explaining brain learning mechanisms or resolving practical applications. To further advance the development of this emerging direction, this paper focuses on three sub-problems: the limitations of spiking actor network coding, the simplification of spiking actor network structure, and the biological implausibility of optimization methods.

The principal research content and innovations of this paper are as follows:

1. Bio-inspired Multi-Scale Dynamic Coding-Based Spiking Reinforcement Learning Algorithm

This chapter introduces a multi-scale dynamic coding improved spiking actor network aimed at achieving efficient decision-making in reinforcement learning tasks. By combining population coding at the network scale with dynamic neuron coding at the neuron scale (including second-order neuron dynamics), this network forms a powerful spatial-temporal state representation. This multi-scale dynamic coding scheme significantly improves the limitations of existing spiking actor network coding, greatly enhancing the expressive power of the spiking actor network. This network exhibits outstanding performance in various complex robotic continuous control tasks on OpenAI Gym, surpassing corresponding deep actor networks for the first time.

2. Bio-inspired Network Connectivity Patterns-Based Spiking Reinforcement Learning Algorithm

This chapter presents a biologically-plausible topology improved spiking actor network aimed at achieving efficient decision-making in reinforcement learning. The network seamlessly integrates spiking neurons with intricate spatial-temporal dynamics and topologies featuring bio-inspired network connectivity patterns. In terms of inter-layer connections, the network simulates local nonlinearities of dendritic trees, while introducing lateral interactions between adjacent neurons for intra-layer connections. This innovation significantly improves the simplicity of existing spiking actor network structures and greatly enhances the information processing capabilities of the network. Results from multiple continuous robot control tasks in OpenAI Gym demonstrate that the biologically-plausible topology improved spiking actor network achieves outstanding control performance, surpassing both its corresponding deep actor network and conventional spiking actor networks.

3. Bio-inspired Evolution of Synaptic Connections-Based Spiking Reinforcement Learning Algorithm

This chapter proposes a method that utilizes genetic evolution algorithms to directly search for spiking policy networks, thereby bypassing the existing optimization frameworks based on gradient backpropagation, which suffer from some biologically implausible issues. Additionally, inspired by neuroscience research indicating that the brain forms memories by creating new synaptic connections and reshaping them based on new experiences, our approach adjusts synaptic connections in the spiking policy network using genetic evolution algorithms instead of adjusting synaptic weights to address given tasks, thus addressing the biological implausibility of existing optimization methods. Experimental results on multiple robot control tasks demonstrate that our method achieves performance comparable to mainstream deep reinforcement learning methods, without the need for significant storage overhead, and exhibits significantly higher energy efficiency.

 

Keyword类脑智能 脉冲神经网络 强化学习
Subject Area人工智能
MOST Discipline Catalogue工学::控制科学与工程
Indexed By其他
Language中文
IS Representative Paper
Sub direction classification类脑模型与计算
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57327
Collection毕业生
毕业生_博士学位论文
Recommended Citation
GB/T 7714
张笃振. 基于类脑脉冲神经网络的强化学习算法研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
张笃振-博士毕业论文_Final_Ver(23167KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张笃振]'s Articles
Baidu academic
Similar articles in Baidu academic
[张笃振]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张笃振]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.