CASIA OpenIR  > 毕业生  > 博士学位论文
基于环境约束的视觉数据分析
周振
学位类型工学博士
导师谭铁牛
2017-06
学位授予单位中国科学院研究生院
学位授予地点北京
关键词环境约束 图像分类 步态识别 行人再识别
摘要社会的发展,尤其是数字化信息和移动互联网的普及,为计算机视觉提供了大量的数据和应用场景。然而,即使目前最先进的计算机视觉系统也仍然在很多任务下和人类视觉系统相差甚远;与此同时,很多现有的计算机视觉方法专注面过于狭隘,没有考虑到数据之间的联系及其和环境的相互关系,因而容易犯一些低级错误。

本文详细挖掘了计算机视觉中的环境约束信息,着重研究了深度信息约束、拓扑结构约束和时空一致性约束,本文的研究内容和主要创新点如下:

1、提出了一种基于深度信息约束的图像分类模型。该方法构建在传统的词包模型基础之上,首先利用马尔可夫随机场估计图像像素的深度信息,再将此深度信息嵌入到图像特征中。在共聚集的过程中,我们将图像特征投影到在深度方向上临近的两个深度平面上,从而使得原来在特征空间无法区分的特征得以在深度空间被正确地分类。所提方法在图像分类,尤其是场景图像分类任务中,表现要优于传统的词包模型和时下一些最新的方法。

2、提出了一种基于拓扑结构约束的步态识别模型。拓扑结构是形状类数据的固有属性,对于步态数据来说,无论行走姿态和视角如何变化,其拓扑结构都未曾发生改变,这就是拓扑不变性的优良特性。与此同时,拓扑不变性缺乏足够的判别力,我们无法利用拓扑不变性来区分结构相似而类别不同的物体。有鉴于此,我们利用持续同调理论在多分辨率和多视角下追踪步态的局部拓扑结构变化,增强了拓扑不变性的表达力,使其适用于计算机视觉的识别任务。实验表明,在跨视角、跨姿态的情况下,该拓扑特征的性能要远远超过传统的步态特征。

3、提出了一种基于时空一致性约束的行人再识别方法。目前的行人再识别研究主要集中在特征学习和度量学习两方面,大部分之前的工作只专注于其中一项;在本文中,我们利用深度神经网络将特征学习和度量学习统一在一个框架下,进行端到端的训练和推理。在特征学习阶段,我们利用基于时序的注意模型来自动挑出具有判别力的帧,使其在特征学习阶段具有很大的权重;在度量学习阶段,我们首先逐位置计算一对视频的相似度,然后利用基于空间的循环神经网络来考虑空间上下文的信息,使得相似性度量具有空间一致性。
其他摘要With the development of our society, especially the growing dissemination of digital information and the popular of Internet, an increasing number of data is available for computer vision researches. However, even the best computer vision system so far cannot be comparable with the human vision system in many applications. Moreover, a large portion of existing works is of limited consideration, which ignores the connection between data and the environment. Thus, many of them may lead to low-level errors in practice.

This thesis focuses on exploring the environmental constrains in computer vision, especially the depth information, topological structure and spatial-temporal consistency. The main content and contribution of this paper is described as follows.

1. We propose a depth-embedded multiple pooling model for image classification. This model is built on top of the traditional bag-of-words method. We firstly exploit the Markov random field to estimate the depth of each pixel and then embed the depth information into the image features, which map the feature space to a higher one. During pooling, features will be projected to two adjacent depth plane, the benefit of which is that our model can distinguish features that cannot be separated within the feature space but can be classified in the depth direction. In experiments, our model outperforms the tradition bag-of-words method, especially for scene image classification.

2. We propose a topological structure based gait recognition system. The topology is the inherent property of shapes, e.g., no matter how a person's walking pose changes and how he/she dresses, the topology of the gait silhouettes is unchanged. This is the so called topological invariance. Meanwhile, the topological invariance lacks enough discrimination to distinguish objects with similar structures while they belong to different categories. Therefore, we exploit persistent homology to track the topology of data with multiple resolutions and multiple views, which enhances its power in describing local details. The extracted topological features are enough for recognition tasks in computer vision. The experiments demonstrate that our proposed topological features outperform traditional gait features, especially in the case of cross-view and cross-pose gait recognition.

3. We propose a spatial-temporal consistency based person re-identification method. Person re-identification methods generally involve two key steps, namely feature learning and metric learning. Most of previous works focus on one of them. In this paper, feature learning and metric learning are incorporated into an end-to-end deep neural network. Using the temporal attention model, we can measure the importance of each frame in a pedestrian video, which is useful for choosing more informative frames and improving feature learning. The spatial recurrent model is designed to explore contextual information spatially, which has been experimentally demonstrated effective for metric learning.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/14699
专题毕业生_博士学位论文
作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
周振. 基于环境约束的视觉数据分析[D]. 北京. 中国科学院研究生院,2017.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis-name_V2.pdf(11422KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[周振]的文章
百度学术
百度学术中相似的文章
[周振]的文章
必应学术
必应学术中相似的文章
[周振]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。