CASIA OpenIR  > 毕业生  > 硕士学位论文
面向多模态序列数据的模式分类方法研究
谢龙飞
Subtype硕士
Thesis Advisor张煦尧
2020-05-29
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline控制工程
Keyword多模态模式识别 情感识别 异质数据融合 生成对抗网络 注意力机制
Abstract

随着互联网技术的不断发展,以视频为主导的多模态序列数据逐渐充斥着我们的生活,因此多模态序列数据的模式分类任务受到人们更加广泛的关注。多模态序列数据的模式分类系统所面临的首要挑战是突破模态之间的语义鸿沟,构建跨模态的各时间步之间的信息交互,实现异质数据的有效融合。此外,各模态之间噪声混杂程度不同,不同模态对于模式分类的贡献程度不尽相同,需要设置一种合适的特征融合方案来进行模态之间的平衡。而视频数据通常时间跨度较长,需要对长时间跨度的交互关系(长程依赖)进行建模。
本文依托多模态视频数据中的情感分类任务,对以上问题进行了一系列研究,提出了两种行之有效的解决方案。首先,本文将模态之间所共有的冗余信息视为不同数据分布之间的重叠区域,仅存在于单一模态的独有信息视为不同数据分布之间的非重叠区域,提出了一种基于生成对抗网络的预训练方法(Bimodal-GAN)对多模态数据中的两种信息进行解耦重组,充分发掘不同模态的有用信息,减轻因分布差异而带来的语义鸿沟问题。
然后针对 Bimodal-GAN 只能两两之间进行对抗训练,且要求各模态输入逐帧对齐的问题,本文对 Bimodal-GAN 进行了相应拓展,提出了一种结合门控机制的 Transformer 网络结构(Gate-Transformer),可直接对任意长度的多模态序列数据不同时间步之间的交互信息进行建模,有效处理长程依赖关系,并可实现更加灵活的特征融合,充分利用多模态数据的优势提升分类精度。 
总的来说,本文的贡献如下:

  1. 基于生成对抗网络提出了一种可以有效解耦多模态数据中共有信息与独有信息的预训练模型,通过解耦重组不同模态的特征表示,可有效提升多模态模式分类的精度。
  2. 将传统基于循环神经网络的多模态序列分类网络扩展为 Transformer 结构,有效提升网络的并行计算能力,缓解因递归操作而带来的信息丢失问题。 
  3. 提出了一种基于门控机制的多模态融合方案,可自动学习不同模态的权值比重,动态调节不同模态的混合比例,有效处理各模态中层级各异的噪声信息。
Other Abstract

With the advance of the Internet, various multimodal sequential data, such as video, gradually becomes more and more abundant. And the demand of multimodal pattern classification is increasing urgent. The first challenge for multimodal classification is the heterogeneous gaps between different modalities, which inhibit the process of features fusion. A more effective strategy should be considered to bridge these gaps to achieve an effective fusion of heterogeneous data. Meanwhile, the noises located in different modalities is complicated, which makes the weights of distinct modalities contributing to decision-making unequal. An appropriate scheme for features fusion is required to balance individual modality. Furthermore, the long-range dependency of sequences must be considered when extracting its feature expressions.
In this thesis, we conduct a series of research based on the task of emotion recognition of video to handle these challenges, and propose two practical strategies. Firstly, the information consisting in different modalities is categorized as shared information and exclusive information from probability perspective. The shared information is the overlapping regions of two distributions of different modalities. While the exclusive one can be regarded as the non-overlapping regions. Then a pre-training strategy named Bimodal-GAN based on generative adversarial networks were proposed to disentangle and refactor the features. With this we can explore the beneficial features of different modalities, and reduce the semantic gaps.
Unfortunately, the multimodal inputs of Bimodal-GAN have to be aligned frame by frame. And the adversarial training should be performed between two modalities, which fail with more modalities. Hence, we extend Bimodal-GAN with transformer network, an attention based network, together with gate fusion unit to conduct multimodal classification, which is entitled Gate-Transformer. In this way, we can utilize multimodal data to improve performance and handle several challenges of multimodal time-series recognition. For instance, the misalignment between different modalities, the difficulty of building long-range dependency and the predicament of feature fusion.
In conclusion, the contributions of this paper are as follows: 

  1. A GAN-based pre-training strategy were proposed to disentangle and refactor the shared and exclusive information of different modalities, improving the accuracy of multimodal pattern recognition.
  2. We reformed the conventional RNN-based multimodal networks through the application of the attention-based transformer, which enhances the parallel computing power and released the information vanishing.
  3. We use a gate based feature fusion strategy combining with feature representation learning to adjust the weights of each modality for modal fusion dynamically, balancing the noises located in different modalities.
Pages88
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/39212
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
谢龙飞. 面向多模态序列数据的模式分类方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
ucasthesis_master (3(1957KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[谢龙飞]'s Articles
Baidu academic
Similar articles in Baidu academic
[谢龙飞]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[谢龙飞]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.