CASIA OpenIR  > 毕业生  > 博士学位论文
基于多种监督信息的物体识别与检测算法研究
刘敬禹
学位类型工学博士
导师王亮
2018-05-24
学位授予单位中国科学院研究生院
学位授予地点北京
关键词物体识别 快速物体检测 基于文本描述的物体定位 深度学习
摘要

物体识别与检测是计算机视觉与模式识别中的基本问题和研究的热门方向。它们在很多领域都有广泛的应用,包括互联网领域基于内容的图像检索,相册自动归类等;包括安防领域的人脸识别、行人检测、行人追踪等;以及无人车自动驾驶领域的行人和车辆检测等等。物体识别与检测已经应用于人们日常生活中的方方面面,计算机自动识别与检测技术也在一定程度上减轻了人们的负担,改变了人们的生活方式。

 

现实场景中,根据不同的任务需求,物体的检测和识别存在着多种监督信息,进而催生了不同的解决思路。本文的研究工作是基于多种监督信息的物体识别与检测算法,本文的研究过程也经历了一个由简到难,由浅入深的过程,监督信息从类别标签和物体坐标,到基于语言的物体描述和位置描述。从开始的基于整张图片的物体分类算法,过渡到需要给出图中物体位置的目标物体检测算法,最后到基于指代性表达的物体定位与描述算法。同时,针对物体检测识别面向应用的特点,本文的研究也注重结合实际的应用场景,设计了一个能够在移动平台快速检测识别物体的检测系统。综上,本文主要针对以下几个物体识别与检测的核心问题展开研究:

 

一.针对基于词袋模型的图像分类算法中视觉词汇之间的关联问题进行了深入的挖掘,提出了两种改进算法,分别是挖掘视觉词条和特征关联的层级编码算法,和挖掘视觉词条间关联的词条连接图算法。1.词袋模型中,每个局部特征的编码向量蕴含着它和周围视觉词条的关联信息,特征编码和池化的过程造成了细节信息的丢失。针对此问题,本文提出了基于层级编码的词袋模型,它能够有效解决原算法中池化操作所带来的编码信息丢失问题。层级编码通过对浅层的编码向量进行新的词典生成、特征编码和特征池化操作,在更高的语义层面形成了更深层次的图像表达,相较之前的分类效果有了显著的提升。

2.针对视觉词条之间的关联,提出了基于数据重构的词条连接图及多角度池化算法。词条连接图的建立通过对视觉特征的数据重构而来,通过局部特征对词典中的聚类词条进行重构实现,这样建立的连接图能够体现数据本身的分布情况。然后,多角度池化通过计算局部特征和词条之间的角度关系,在新的词条联合域上进行响应,使得最终的分类效果比原始算法有了显著提高。

 

二.针对物体检测任务对于实时性和低功耗移动性的需求,研发了基于Fast R-CNN架构的移动平台上的快速低功耗物体检测系统。通过对Edge Box和BING两种物体候选框算法的时间分析,以及在固定神经网络总计算量的情况下,对卷积神经网络中的网络层数、滤波器大小和滤波器数量三者的调整和结果分析,寻找高准确率计算量低的卷积网络,并设计了三种规模不同的卷积神经网络。离线实验中,通过对候选框数量、位置回归算法等因素的控制和调整,寻求检测系统在物体检测准确率和速度之间的最优平衡点。另外,根据低功耗图像识别任务对于mAP/E指标的要求,我们选择了Tegra K1的嵌入式平台作为检测系统的实施平台。同时,为了充分利用TK1的计算资源,我们依照计算机体系架构中流水线设计的思想,设计了三个阶段并行执行的物体检测系统。最终,国际低功耗图像识别比赛的成绩也证明了本文设计系统的有效性。

 

三.将研究问题进一步延伸到了基于文本描述的物体定位问题,即指代性表达的生成和理解。根据物体描述中属性特征的关键作用,本文提出了借助视觉属性和引导注意机制的指代性表达物体定位与描述的算法。通过将属性学习建模成多标签图像分类问题,获得的属性可以嵌入到后续的描述生成系统和定位系统之中。更进一步,我们将属性作为视觉注意和词语注意的指导信号,建立相应的注意模型,使得相应的视觉部分和文本描述能够获得更高的注意权重。物体描述的生成采用了基于LSTM的生成模型,物体的定位采用了包括检测器和嵌入公共空间的理解模型,考虑到物体描述唯一性的特点,损失函数采用了三元组正负样本的固定间隔损失函数。在标准数据库RefCOCO、RefCOCO+和RefCOCOg的实验结果表明了提出算法的有效性。

 

综上,根据任务由简到难,监督信息由形式化标量到复杂抽象的自然语言,本文研究了物体检测识别的三个相关问题展开了深入研究,同时设计研发了一个快速低功耗的物体检测系统。

其他摘要

Object recognition and detection are fundamental problems and hot research areas in computer vision and pattern recognition. They are widely applied in a lot of fields, including content based image retrieval, photo classification in the internet; and face recognition, pedestrian detection, and pedestrian tracking in video surveillance; and pedestrian detection and auto motor detection in driver-less auto motors. Object recognition and detection have been widely used in people's everyday life. The technique has greatly reduced burdens of people, and is changing our everyday life.

 

Based on different requirements, various supervision information exist in real applications of object recognition and detection, leading to various solutions. Our research mainly focuses on object recognition and detection based on various supervised information.The research goes through an easy-to-hard, shallow-to-deep process. From the beginning work focusing on image-based object classification, towards object detection tasks additionally requiring the outputs of the objects, and finally to the object localization tasks given textual descriptions. Besides, to meet the practical requirement of object detection, our research also focuses on the realistic scenario, aiming to design a fast and low power object detection system on mobile system. In conclusion, we mainly focus on the following several aspects:

 

1. Explore the relations within visual words in image classification based on bag-of-words methods. Two methods are proposed. 1. A hierarchical encoding method is proposed. Hierarchical encoding can efficiently solve the missing-information problem in the original encoding and pooling process. The proposed method treats the coding vectors as higher level features, then generates new code books, encodes and pools features, end up with image representations at a higher level. 2. A code graph model based on data reconstruction is proposed. The code graph can model the multi-view structure of the feature space, thus explore deeper into the operations of encoding and pooling. The building of code graph is based on data reconstruction, i.e. the reconstruction of the code words. The code graph can reflect the data distribution of the features.

 

2. To meet the requirements of real-time and low power of the object detection task, a fast and low power object detection system based on Fast R-CNN is designed and developed. By the analysis of two object proposal methods: Edge Box and BING, as well as the overall computation amount of the deep network, we control the number of layers, filter size and filter number to get different results, and seek the CNN structure with high accuracy and low computation, end up with three CNNs of different scales and structures. In the off-line experiment, by controlling the proposal number and location regression, we seek the best trade-off between accuracy and speed. Also, following the measure of mAP/Energy in the Low Power Image Recognition Challenge(LPIRC), we choose Tegra K1 embedded system as our platform. To fully utilize the computational resource of TK1, we use the idea of pipeline design, end up with an object detection system in three parallelized stages. Finally, the results in LPIRC shows the effectiveness of our method.

 

3. Object localization based on textual description, i.e. referring expression generation and comprehension.

Since visual attributes play a key role in describing the object, we propose a method via attributes and the guided attention model to describe and localize objects. we formulate attribute learning in the form of multi-label classification problem, and construct the attention module, wherein the corresponding visual and textual parts can have more attention. The generation model uses the LSTM module, and the comprehension model uses the common space embedding model. Considering the requirement of uniqueness of the pair of object and description,

the loss function uses the margin ranking loss. The experimental results on standard data sets of RefCOCO, RefCOCO+ and RefCOCOg show the effectiveness of our methods.

 

In conclusion, our tasks changes from easy to hard, and the supervision infromation changes from formatted annatation to natural language. we study the three problems of object recognition and detection, and design a fast and low power object detection system.

文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/21026
专题毕业生_博士学位论文
作者单位中科院自动化所
推荐引用方式
GB/T 7714
刘敬禹. 基于多种监督信息的物体识别与检测算法研究[D]. 北京. 中国科学院研究生院,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis-刘敬禹.pdf(10708KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[刘敬禹]的文章
百度学术
百度学术中相似的文章
[刘敬禹]的文章
必应学术
必应学术中相似的文章
[刘敬禹]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。