基于多模态深度学习的视觉神经信息解码方法研究

	基于多模态深度学习的视觉神经信息解码方法研究
	李丹
	2021-05-14
页数	132
学位类型	博士
中文摘要	视觉信息是人类感知、理解和认识外部世界的最重要信息来源之一。近些年来，视觉神经信息解码领域引起了广泛的关注。先前的研究通常采用系统辨识的方法，通过研究脑活动与人脑接收视觉刺激之间的关系对大脑的视觉信息进行解码，从而推断大脑的功能机制。得益于神经信号采集技术（功能磁共振成像（Functional Magnetic Resonance Imaging，fMRI）和脑电图（Electroencephalography，EEG））的进步和人工智能技术的高速发展，通过非侵入手段对视觉神经信息解码变得切实可行，并取得了一定的进展。但由于视觉刺激和脑信号的强异质性、脑信号模态缺失、解码信息单一、个体差异性大以及小样本等问题，现有的方法还难以有效地利用脑信号对视觉刺激进行图像信息解码。因此，研究更有效克服上述问题的视觉神经信息解码方法不仅能够促进人们对视觉加工机制的理解，而且能够为开发类脑机器感知方法提供新的视角。基于此目的，本文基于多模态深度学习开展视觉信息解码方法的研究，在提高视觉信息解码精度的前提下，有效克服视觉神经信息解码领域中存在的强异质性、信息单一、模态缺失和小样本等问题。本文的主要内容及创新点如下：（1）针对脑信号和图像模态强异质性、成对数据强依赖性和信息利用不充分等问题，本研究提出了面向视觉重建任务的半监督生成对抗网络方法。该方法将脑活动的语义解码和图像重建任务进行统一，利用语义信息作为脑信号和图像模态的桥梁，从而克服不同模态强异质性的问题。其次，该方法充分利用非成对的图像模态数据挖掘深层语义信息以辅助跨模态图像生成任务，极大减少了方法对成对模态数据的依赖性。研究结果表明，本研究提出的半监督对抗网络方法能够生成精度高和语义信息明确的图像。（2）针对解码的图像信息单一和远离自然场景等问题，本研究提出了基于自然场景的半监督协同训练的多标签语义预测方法。该方法融合协同训练网络和对称的语义特征翻译网络等模块，利用图像模态来辅助自然场景诱发的脑信号进行多标签学习。同时，该方法有效克服标签样本不足和脑信号模态缺失等问题，提高脑信号的解码精度。在多个独立数据集分析结果表明，本研究提出的半监督协同训练多标签语义预测方法能较好地提高脑信号到图像多标签语义的解码精度。（3）针对单被试小样本的问题，本研究提出了基于多模态对抗学习的多被试数据增广方法。该方法基于子空间和多个生成对抗网络将目标被试少量数据与其他被试数据相结合，有效地克服了单被试样本数目少和多被试差异大的问题，提高了单被试的解码精度。在多个独立数据集的分析结果表明，本研究提出的基于多模态对抗学习的多被试数据增广方法提高了单被试的脑信号到图像语义的解码精度。
英文摘要	Visual information is one of the most important information sources for human beings to perceive and understand the external world. In recent years, the visual neural information decoding field has attracted extensive attention. Previous studies usually use the system identification method to decode the information of the brain by studying the relationship between brain activity and visual stimulation received by the human brain, so as to infer the functional mechanism of the brain. Thanks to the progress of neural signal acquisition technology (Functional Magnetic Resonance Imaging (fMRI) and Electroencephalography (EEG)) and the rapid development of artificial intelligence technology, it is feasible to decode the visual neural information by non-invasive means, and some progress has been made. However, due to the heterogeneity between brain signal modality and stimulus image modality, the brain modality missing, the single decoding information, the large differences between subjects and the small sample size of neural data, it is difficult for the existing methods to decode the image information by using brain signals. Therefore, the study of more effective visual neural information decoding methods not only promote people's understanding of visual processing mechanism, but also provide a new perspective for the development of brain like machine perception methods. For this purpose, based on multi-modal deep learning, this dissertation carries out the research of visual information decoding method. On the premise of improving the accuracy of visual information decoding, it effectively overcomes the problems of strong heterogeneity, single information, modality missing, the small sample size and so on in the visual neural information decoding field. The main contents and innovations of this dissertation are as follows: (1) In order to solve the problems of the strong heterogeneity of brain signal and image modality, the strong dependence of paired data and the insufficient utilization of information, a semi-supervised generative adversarial method for visual reconstruction task is proposed. This method unifies the semantic decoding and image reconstruction tasks of brain activity, and uses semantic information as a bridge between brain signal modality and image modality, so as to overcome the strong heterogeneity of different modalities. Secondly, the method makes full use of unpaired image data to mine deep semantic information to assist cross-modal image generation task, which greatly reduces the dependence of the method on paired data. The results show that the proposed semi-supervised generative adversarial method can generate images with high accuracy and clear semantic information. (2) In order to solve the problems of single decoded image information and far away from natural scene, a multi-label semantic prediction analysis method based on semi-supervised co-training in natural scene is proposed. The method combines co-training network and symmetrical semantic feature translators to decode the brain signal induced by natural scenes, and uses image modality to assist brain signal modality in multi-label learning. At the same time, the method overcomes the problems of insufficient label samples and brain signal modality missing, and improves the decoding accuracy of brain signal. The results of multiple independent datasets show that the proposed semi- supervised co-training multi-label semantic prediction method can greatly improve the decoding accuracy of brain signal to image multi-label semantic information. (3) In order to solve the problem of small sample size in existing fMRI dataset, this dissertation proposes a multi-subject fMRI data augmentation method with multi-modal adversarial learning. Based on subspace and multi-modal adversarial learning, this method solves the problem of differences between subjects, which can greatly improve the decoding accuracy of single subject by combining a small number of data of target subjects and the data of other subjects. It provides a new insight into the problem of brain decoding. The analysis results on several independent datasets show that the proposed multi-subject data augmentation method with multi-modal adversarial learning improves the semantic decoding accuracy of single subject.
关键词	视觉神经信息解码多模态深度学习半监督学习多标签解码数据增广
语种	中文
七大方向——子方向分类	脑机接口
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44918
专题	脑图谱与类脑智能实验室_神经计算与脑机交互
推荐引用方式 GB/T 7714	李丹. 基于多模态深度学习的视觉神经信息解码方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（7172KB）	学位论文		开放获取	CC BY-NC-SA