CASIA OpenIR  > 精密感知与控制研究中心  > 人工智能与机器学习
Thesis Advisor张文生
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword深度表示 图像序列分类 模型融合 行为分类 半监督学习



第一,提出一种动静流融合网络的图像序列分类方法(Static and Motion Streams Network,SMNet)。首先,针对雷达图像序列语义信息连续但细节信息不连续的特点,用整段图像序列为输入的运动流分支捕获连续性强的图像序列特征,用最后一帧为输入的静态流分支保留当前帧中的细节特征,通过融合两分支信息提取出兼顾语义与细节的特征,从而提出了可同时捕获两种特征的双流融合结构。其次,提出了动态加权训练机制以促进SMNet的训练,增强回波边缘等难分类区域的训练权重,以获取更加准确的分类结果。在国家气象局提供的北京雷达数据集上进行实验验证,结果表明SMNet相比于主流方法在多个分类指标上均有提升,所提方法与代码成果已应用于国家气象局的气象服务中。

第二,提出一种互学习的半监督运动行为分类方法(Temporal Prior Guided Mutual Learning Framework,TPML)。首先,针对无标签数据的监督信息匮乏问题,构建具备不同初始化参数的主网络与辅助网络,通过互相学习彼此产生的伪标签挖掘数据中具有泛化性的监督信息,从而提出了半监督互学习框架。其次,在辅助网络端引入时序先验信息,即提取时序特征后,将该特征与输入序列的时序邻域内其他序列特征进行聚合,进而在获取更加鲁棒准确的监督信息之后蒸馏回主网络中。在三个公开的人类行为分类数据集上的实验表明TPML能够更好地挖掘无标签时序数据的监督信息,并在多种标注率下均取得更高的平均F1分数。

Other Abstract

Since the 21st century, with the rapid development of social economy, there are a large amount of sequence data produced everyday in all walk of life. The effective information mined from the sequence data has become an important basis for intelligent decision-making in meteorology, health field, etc. For example, the precipitation can be indicated through analyzing the image sequence of meteorological radar, and the activity can be recorded through analyzing the sequence data of motion sensors. In the era of big data, the forms of sequence data tend to be diverse and the patterns hidden in sequence data are complex. The classical representation methods are often limited by the expression ability and label scale, so they can not accurately describe the features of complex sequences. Therefore, how to extract more refined and accurate feature representation for sequence data in specific scenes to serve intelligent analysis and decision-making has become a key problem.

Starting with the deep representation method, this thesis focuses on the task of image sequence classification and motion activity classification. To overcome the difficulty in modeling meteorological radar image sequence and the challenge of annotation scarcity issues in motion activity recognition, the deep representation methods in these specific scenes are studied. The ultimate aim is to improve the effectiveness of classification task and provide more high-quality meteorological and healthcare services for the public. The contributions and innovations of this thesis are as follows:

First, a static and motion streams network (SMNet) image sequence classification method is proposed. Firstly, to capture the characteristics containing continuous se[1]mantic information and discontinuous detailed information, the motion stream mining features with strong continuity from the whole image sequence and the static stream capturing details from the last frame are utilized. Then semantic and detailed features are mixed by fusing the two streams so that a two streams fusion structure is proposed. Secondly, to promote the training procedure of SMNet, a dynamic weighted training mechanism is proposed. It enhances the training weight for areas hard to distinguish, like the edge of echoes, to obtain more precise classification results. The experimental validation is carried out on the Beijing radar data set provided by the China Meteorological Administration. The results confirm the potential of SMNet with four commonly used metrics. The proposed method and the corresponding code have been applied in the meteorological service provided by China Meteorological Administration.

Second, a semi-supervised activity classification method based on Temporal prior Guided Mutual Learning (TPML) is proposed. Firstly, to solve the supervision scarcity of unlabeled data, the main network and auxiliary network with different initialization parameters are constructed, and the generalized supervision information by learning from soft pseudo labels generated from each other are mined. So a semi-supervised mutual learning framework is proposed. Secondly, the temporal priori information is introduced to the auxiliary network. TPML aggregates the feature with other sequence features in the temporal neighborhood of the input sequence after extracting temporal features, and then distills it back to the main network. Thus, the supervision information distilled is more robust and accurate. Experiments on three public human activity classification data sets show that TPML is capable to mine the supervision information of unlabeled sequence data, and achieve a higher average F1 score under different annotation rates.

Document Type学位论文
Recommended Citation
GB/T 7714
曲宇勋. 深度表示的序列图像与运动行为分类方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.
Files in This Item:
File Name/Size DocType Version Access License
final.pdf(4179KB)学位论文 开放获取CC BY-NC-SA
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[曲宇勋]'s Articles
Baidu academic
Similar articles in Baidu academic
[曲宇勋]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[曲宇勋]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.