面向多模态序列数据的模式分类方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	面向多模态序列数据的模式分类方法研究
	谢龙飞
	2020-05-29
页数	88
学位类型	硕士
中文摘要	随着互联网技术的不断发展，以视频为主导的多模态序列数据逐渐充斥着我们的生活，因此多模态序列数据的模式分类任务受到人们更加广泛的关注。多模态序列数据的模式分类系统所面临的首要挑战是突破模态之间的语义鸿沟，构建跨模态的各时间步之间的信息交互，实现异质数据的有效融合。此外，各模态之间噪声混杂程度不同，不同模态对于模式分类的贡献程度不尽相同，需要设置一种合适的特征融合方案来进行模态之间的平衡。而视频数据通常时间跨度较长，需要对长时间跨度的交互关系（长程依赖）进行建模。本文依托多模态视频数据中的情感分类任务，对以上问题进行了一系列研究，提出了两种行之有效的解决方案。首先，本文将模态之间所共有的冗余信息视为不同数据分布之间的重叠区域，仅存在于单一模态的独有信息视为不同数据分布之间的非重叠区域，提出了一种基于生成对抗网络的预训练方法（Bimodal-GAN）对多模态数据中的两种信息进行解耦重组，充分发掘不同模态的有用信息，减轻因分布差异而带来的语义鸿沟问题。然后针对 Bimodal-GAN 只能两两之间进行对抗训练，且要求各模态输入逐帧对齐的问题，本文对 Bimodal-GAN 进行了相应拓展，提出了一种结合门控机制的 Transformer 网络结构（Gate-Transformer），可直接对任意长度的多模态序列数据不同时间步之间的交互信息进行建模，有效处理长程依赖关系，并可实现更加灵活的特征融合，充分利用多模态数据的优势提升分类精度。总的来说，本文的贡献如下：基于生成对抗网络提出了一种可以有效解耦多模态数据中共有信息与独有信息的预训练模型，通过解耦重组不同模态的特征表示，可有效提升多模态模式分类的精度。将传统基于循环神经网络的多模态序列分类网络扩展为 Transformer 结构，有效提升网络的并行计算能力，缓解因递归操作而带来的信息丢失问题。提出了一种基于门控机制的多模态融合方案，可自动学习不同模态的权值比重，动态调节不同模态的混合比例，有效处理各模态中层级各异的噪声信息。
英文摘要	With the advance of the Internet, various multimodal sequential data, such as video, gradually becomes more and more abundant. And the demand of multimodal pattern classification is increasing urgent. The first challenge for multimodal classification is the heterogeneous gaps between different modalities, which inhibit the process of features fusion. A more effective strategy should be considered to bridge these gaps to achieve an effective fusion of heterogeneous data. Meanwhile, the noises located in different modalities is complicated, which makes the weights of distinct modalities contributing to decision-making unequal. An appropriate scheme for features fusion is required to balance individual modality. Furthermore, the long-range dependency of sequences must be considered when extracting its feature expressions. In this thesis, we conduct a series of research based on the task of emotion recognition of video to handle these challenges, and propose two practical strategies. Firstly, the information consisting in different modalities is categorized as shared information and exclusive information from probability perspective. The shared information is the overlapping regions of two distributions of different modalities. While the exclusive one can be regarded as the non-overlapping regions. Then a pre-training strategy named Bimodal-GAN based on generative adversarial networks were proposed to disentangle and refactor the features. With this we can explore the beneficial features of different modalities, and reduce the semantic gaps. Unfortunately, the multimodal inputs of Bimodal-GAN have to be aligned frame by frame. And the adversarial training should be performed between two modalities, which fail with more modalities. Hence, we extend Bimodal-GAN with transformer network, an attention based network, together with gate fusion unit to conduct multimodal classification, which is entitled Gate-Transformer. In this way, we can utilize multimodal data to improve performance and handle several challenges of multimodal time-series recognition. For instance, the misalignment between different modalities, the difficulty of building long-range dependency and the predicament of feature fusion. In conclusion, the contributions of this paper are as follows: A GAN-based pre-training strategy were proposed to disentangle and refactor the shared and exclusive information of different modalities, improving the accuracy of multimodal pattern recognition. We reformed the conventional RNN-based multimodal networks through the application of the attention-based transformer, which enhances the parallel computing power and released the information vanishing. We use a gate based feature fusion strategy combining with feature representation learning to adjust the weights of each modality for modal fusion dynamically, balancing the noises located in different modalities.
关键词	多模态模式识别情感识别异质数据融合生成对抗网络注意力机制
语种	中文
七大方向——子方向分类	模式识别基础
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39212
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	谢龙飞. 面向多模态序列数据的模式分类方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
ucasthesis_master (3（1957KB）	学位论文		限制开放	CC BY-NC-SA