深度表示的序列图像与运动行为分类方法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 人工智能与机器学习（杨雪冰）-技术团队

	深度表示的序列图像与运动行为分类方法研究
	曲宇勋
	2022-05-20
页数	72
学位类型	硕士
中文摘要	21世纪以来，随着社会经济的快速发展，各行各业每日都会产生海量的序列数据。从序列数据中挖掘出的有效信息已成为气象、健康等领域科学决策的重要依据，例如分析气象雷达序列图像可以指示降水情况、分析运动传感器序列数据可以记录运动行为等。大数据时代的序列数据形式多样、模式复杂，经典的表示方法往往受制于序列表达能力与标签规模，无法准确刻画这些复杂序列的特征。因此，如何为特定场景下的序列数据提取更加精炼与准确的特征表示以服务于智能分析与辅助决策成为亟待解决的关键问题。本文以深度表示方法为切入点，聚焦序列图像分类与运动行为分类任务，针对气象雷达序列图像分类中的建模困难问题以及运动传感器行为分类中的标签匮乏问题，深入研究特定场景中的深度表示方法，旨在提升分类任务效果，为大众提供更加优质的气象与健康服务。本文的贡献与创新点如下：第一，提出一种动静流融合网络的图像序列分类方法（Static and Motion Streams Network，SMNet）。首先，针对雷达图像序列语义信息连续但细节信息不连续的特点，用整段图像序列为输入的运动流分支捕获连续性强的图像序列特征，用最后一帧为输入的静态流分支保留当前帧中的细节特征，通过融合两分支信息提取出兼顾语义与细节的特征，从而提出了可同时捕获两种特征的双流融合结构。其次，提出了动态加权训练机制以促进SMNet的训练，增强回波边缘等难分类区域的训练权重，以获取更加准确的分类结果。在国家气象局提供的北京雷达数据集上进行实验验证，结果表明SMNet相比于主流方法在多个分类指标上均有提升，所提方法与代码成果已应用于国家气象局的气象服务中。第二，提出一种互学习的半监督运动行为分类方法（Temporal Prior Guided Mutual Learning Framework，TPML）。首先，针对无标签数据的监督信息匮乏问题，构建具备不同初始化参数的主网络与辅助网络，通过互相学习彼此产生的伪标签挖掘数据中具有泛化性的监督信息，从而提出了半监督互学习框架。其次，在辅助网络端引入时序先验信息，即提取时序特征后，将该特征与输入序列的时序邻域内其他序列特征进行聚合，进而在获取更加鲁棒准确的监督信息之后蒸馏回主网络中。在三个公开的人类行为分类数据集上的实验表明TPML能够更好地挖掘无标签时序数据的监督信息，并在多种标注率下均取得更高的平均F1分数。
英文摘要	Since the 21st century, with the rapid development of social economy, there are a large amount of sequence data produced everyday in all walk of life. The effective information mined from the sequence data has become an important basis for intelligent decision-making in meteorology, health field, etc. For example, the precipitation can be indicated through analyzing the image sequence of meteorological radar, and the activity can be recorded through analyzing the sequence data of motion sensors. In the era of big data, the forms of sequence data tend to be diverse and the patterns hidden in sequence data are complex. The classical representation methods are often limited by the expression ability and label scale, so they can not accurately describe the features of complex sequences. Therefore, how to extract more refined and accurate feature representation for sequence data in specific scenes to serve intelligent analysis and decision-making has become a key problem. Starting with the deep representation method, this thesis focuses on the task of image sequence classification and motion activity classification. To overcome the difficulty in modeling meteorological radar image sequence and the challenge of annotation scarcity issues in motion activity recognition, the deep representation methods in these specific scenes are studied. The ultimate aim is to improve the effectiveness of classification task and provide more high-quality meteorological and healthcare services for the public. The contributions and innovations of this thesis are as follows: First, a static and motion streams network (SMNet) image sequence classification method is proposed. Firstly, to capture the characteristics containing continuous se[1]mantic information and discontinuous detailed information, the motion stream mining features with strong continuity from the whole image sequence and the static stream capturing details from the last frame are utilized. Then semantic and detailed features are mixed by fusing the two streams so that a two streams fusion structure is proposed. Secondly, to promote the training procedure of SMNet, a dynamic weighted training mechanism is proposed. It enhances the training weight for areas hard to distinguish, like the edge of echoes, to obtain more precise classification results. The experimental validation is carried out on the Beijing radar data set provided by the China Meteorological Administration. The results confirm the potential of SMNet with four commonly used metrics. The proposed method and the corresponding code have been applied in the meteorological service provided by China Meteorological Administration. Second, a semi-supervised activity classification method based on Temporal prior Guided Mutual Learning (TPML) is proposed. Firstly, to solve the supervision scarcity of unlabeled data, the main network and auxiliary network with different initialization parameters are constructed, and the generalized supervision information by learning from soft pseudo labels generated from each other are mined. So a semi-supervised mutual learning framework is proposed. Secondly, the temporal priori information is introduced to the auxiliary network. TPML aggregates the feature with other sequence features in the temporal neighborhood of the input sequence after extracting temporal features, and then distills it back to the main network. Thus, the supervision information distilled is more robust and accurate. Experiments on three public human activity classification data sets show that TPML is capable to mine the supervision information of unlabeled sequence data, and achieve a higher average F1 score under different annotation rates.
关键词	深度表示图像序列分类模型融合行为分类半监督学习
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48615
专题	多模态人工智能系统全国重点实验室_人工智能与机器学习（杨雪冰）-技术团队
推荐引用方式 GB/T 7714	曲宇勋. 深度表示的序列图像与运动行为分类方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
final.pdf（4179KB）	学位论文		开放获取	CC BY-NC-SA