基于多模态学习的视觉神经信息编解码方法研究 | |
周琼怡 | |
2023-05-20 | |
页数 | 150 |
学位类型 | 博士 |
中文摘要 | 视觉神经信息编解码研究能够建模视觉刺激和神经活动之间的关系,对计算机科学和认知神经科学均有重要意义。其中,编码研究有助于探究人脑视觉加工机制,评估人工神经网络的类脑特性,并推动类脑视觉模型的改进;解码研究能够赋能脑机接口系统设计,构建人脑与外部世界的信息传输通路。人工智能技术的高速发展为视觉神经信息编解码的研究提供了新的思路与方向。借助深度神经网络(Deep Neural Networks,DNN)的强大表征和计算能力,视觉神经信息编解码在神经响应的预测、刺激图像的语义解码等方面取得极大的进展。但是,视觉神经信息编解码方法研究仍存在以下挑战:(1)视觉刺激和神经响应模态在数据形式和分布上存在强异质性;(2)较强的个体差异导致编解码模型在不同被试上的泛化性较差;(3)基于DNN表征的编解码模型的可解释性较弱。为克服上述困难,本文将视觉刺激与神经活动视为不同模态,开展了基于多模态学习的视觉神经信息编解码方法的研究。本文主要研究内容及创新点如下: 1. 针对视觉刺激和神经响应模态间的强异质性问题,本文提出了基于可逆归一化流的跨模态生成方法。首先,该方法将视觉刺激和神经响应映射到模态共享的隐空间,并在隐空间上设计局部和全局约束以保证视觉刺激和神经响应表征的模态对齐。其次,该方法基于归一化流设计了面向视觉神经信息编解码的跨模态生成模型,利用归一化流的可逆性保证跨模态生成中不同模态间信息传递的完整性,确保跨模态重建图像能够保留更丰富的图像细节。此外,本方法通过一次训练即可完成视觉神经信息编码和解码两个对偶任务,极大地降低了训练成本。在视网膜神经节细胞电生理和大脑视皮层功能磁共振成像(functional Magnetic Resonance Imaging,fMRI)两种神经数据上的对比结果表明,该方法在编码和解码两个任务上的综合性能均优于先前的对比方法,同时重建的图像保留了更丰富的细节。 |
英文摘要 | Research on visual neural information encoding and decoding can model the relationship between visual stimuli and neural activities, which is of great significance to both computer science and cognitive neuroscience. Among them, encoding research helps to explore the human brain's visual processing mechanism, evaluate the brain-like properties of artificial neural networks, and promote the improvement of brain-like visual models; decoding research can empower the design of brain-computer interface systems and construct information transmission pathways between the human brain and the external world. The rapid development of artificial intelligence technology provides new ideas and directions for the study of visual neural information encoding and decoding. With the powerful representation and computing capabilities of Deep Neural Networks (DNNs), visual neural information encoding and decoding has made great progress in predicting neural responses and decoding semantic information from images. However, research on visual neural information encoding and decoding still faces the following challenges: (1) Strong heterogeneity in the data format and distribution of visual stimuli and neural responses; (2) Strong individual differences result in poor generalization of encoding and decoding models across different subjects; (3) Weak interpretability of encoding and decoding models based on DNN representation. To overcome these difficulties, this thesis regards visual stimuli and neural activities as different modalities and conducts research on visual neural information encoding and decoding based on multimodal learning. The main research content and novelties of this thesis are as follows: 1. To address the issue of strong heterogeneity between visual stimuli and neural response modalities, a cross-modal generation method based on invertible normalizing flows is proposed. Firstly, this method maps visual stimuli and neural responses to a shared latent space, and designs local and global constraints on the latent space to ensure modality alignment of visual stimuli and neural responses. Secondly, based on the normalizing flow, this method designs a cross-modal generation model for visual neural information encoding and decoding, using the reversibility of normalizing flows to ensure the integrity of information transmission between different modalities in cross-modal generation, and to preserve richer image details in cross-modal reconstruction. In addition, this method completes the dual tasks of visual neural information encoding and decoding through a single training, greatly reducing the training costs. Comparative results on two types of neural data, retinal ganglion cell electrophysiology and functional Magnetic Resonance Imaging (fMRI) of the brain visual cortex, show that the proposed method performs better than previous comparative methods in both encoding and decoding tasks, while preserving richer details in the reconstructed images. |
关键词 | 视觉神经信息编解码 多模态学习 归一化流 多被试语义解码 无监督解耦表征学习 |
语种 | 中文 |
七大方向——子方向分类 | 脑机接口 |
国重实验室规划方向分类 | 认知机理与类脑学习 |
是否有论文关联数据集需要存交 | 否 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/52097 |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 周琼怡. 基于多模态学习的视觉神经信息编解码方法研究[D],2023. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
thesis_明版_答辩后修改完整版_I(21688KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[周琼怡]的文章 |
百度学术 |
百度学术中相似的文章 |
[周琼怡]的文章 |
必应学术 |
必应学术中相似的文章 |
[周琼怡]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论