图像中目标检索方法研究与应用

CASIA OpenIR > 毕业生 > 博士学位论文

	图像中目标检索方法研究与应用
	祁成祚
	2018-05-29
学位类型	工学博士
英文摘要	随着计算机技术、多媒体技术和互联网技术的迅猛发展，生动形象、表现力强、信息量大的网络图像数据呈现出爆炸式的增长。为了便于用户从海量的网络图像资源中快速找到感兴趣的图像，图像检索技术应运而生。图像中目标检索技术通过用户框选感兴趣目标或者上传目标图片来直接获取用户需求，从大规模数据库中检索得到包含该目标的图片。由于图像中目标的多变性和复杂性，实现快速准确的目标检索是一个非常有挑战性的工作。因此，对图像中目标检索技术的研究具有重要的理论意义和实际应用价值。图像中目标检索的特别之处在于用户感兴趣的目标只占图像尺寸的一部分，甚至是很小一部分。因此，如何去除背景噪声的干扰，以及在小物体情况下如何检测得到前景目标是目标检索领域的难点。与此同时，在能够获取前景目标的情况下，传统的手工设计的局部特征鉴别性不够，如何有效利用有监督信息学习得到表达能力更强的语义特征，也是一个重要问题。本文针对这些问题，深入研究了图像中目标表示学习的图像检索技术，提高了目标检索准确度。并将相关成果应用到图形商标的侵权检索中。本文的主要贡献和创新点如下： 1. 提出了一种结合空间权重的Fisher向量图像表示方法卷积神经网络（Convolutional Neural Network, CNN）在图像特征表示中强大的鉴别性，使得其在图像检索任务中至关重要。其中卷积层的特征表达能力优于全连接层特征，但是卷积层的特征依赖编码和聚合方法才能得到图片的整体表达。本文在前期实验中发现卷积神经网络的卷积层局部描述子在类似Fisher向量编码，局部聚合向量编码(VLAD)等基于码本的高维聚合方法中表现还不如简单的均值池化或者最大值池化等方法。针对这种问题，提出了一种简单而有效的方法，在这类高维编码方法的聚合方式中引入空间权重信息，这样可以在图像整体表示中突出前景物体，抑制背景噪声，解决了此类高维编码方法在卷积神经网络的卷积局部描述子上效果不好的问题。同时进一步的可视化并分析了数据库中所有图片的空间权重图的整体分布，结合空间权重图分布的统一规律提出了截断的权重聚合方法，进一步的提升了检索性能。 2. 提出了一种结合检测的概率候选框特征聚合方法场景商标检索是目标检索领域的一种特殊任务，主要难点在于商标图片只占据整体图像的很小一部分，这对图像的整体表示以及检索方法都有很大的挑战。针对这个问题本文提出了一种结合检测框架的特征表示方法，采用主流的目标检测方法检测得到前景商标的多个候选框。并与此同时，提出了概率候选框的概念。然后通过一个二级聚合方法：候选框级别和图片整体级别，得到图片的整体表示。在引入检测环节，保证全局特征表示中商标区域的召回的同时，将检测环节的候选框打分作为置信度加入第二级聚合中，第一次将目标检测的思想引入到场景商标检索任务中，并较好的平衡了前景商标区域的召回和精度的问题，最后的实验结果在场景商标检索公开数据集上达到了当时最好的结果。 3. 提出了一种基于注意力机制的端对端图像表示学习方法在目标检索任务中引入检测环节，可以更好的将图片整体表达集中在特定目标区域。但是引入检测环节的特征表示方法需要训练独立的目标检测器，并同时依赖大量的精确的目标框的手工标注信息。针对这个问题，本文提出了一种只根据类别标签来同时学习检测器和特征表示的方法，设计了一种基于注意力机制的打分定位子网络作为目标或者部件的检测器，并采用更贴近检索任务的排序损失端对端的学习得到图片的全局嵌入特征表示。在主流的公开数据集上进行了效果的评测，结果表明我们在只需要四个候选框的情况下，就可以得到与之前方法几十甚至上百个候选框持平甚至更好的效果，而且不依赖于任何目标框的手工标注信息。 4. 图形商标侵权检索系统最后，基于以上提出的方法，本文设计和实现了基于商标侵权保护的图形商标检索系统，为用户提供了网页端和移动端稳定、便捷的图形商标查询服务。并将本文提出的特征表示方法和基于全局特征表示的图像检索方法成功应用于该系统的搭建和对外提供服务中，有效的实现了该系统的图形商标查询模块，为知识产权保护尤其是图形商标侵权判定、图形商标设计、商标交易等服务提供了强大的技术支撑作用。 ; With the rapid development of computer technology, multi-media and internet, the number of vivid, expressive and informative web images is experiencing explosive growth. Image retrieval technique is developed to make it convenient for users to find the images they need. Object retrieval aims at searching for the images which contain the same object of the query in large scale web images. Due to the complexity and diversity of object in image, fast and accurate object retrieval is challenging. Therefore, research on object retrieval has meaning both in theory and application. The special part of object retrieval lies in that the foreground object may only occupies a small part of the whole image. Therefore, removing background noise and detecting small foreground objects are the difficulties of object retrieval field. Meanwhile, in the context of having access to foreground objcet, traditional hand-crafted local descriptors lack discrimination power. How to utilize the supervised information to generate more powerful semantic feature remains an important issue. We dig deeply into the image retrieval methods based on object representation learning and improve the retrieval performance. What's more, we adopt our improvement to solve trademark image infringement detection. The main contributions and innovations of this thesis are as follows: 1. A spatial weighted Fisher Vector based image representation method has been proposed. Due to its discrimination power, CNN(Convolutional Neural Network) based image representations play key role in object retrieval. Features generated by convolutional layers are better than those from FC(Fully Connected) layers. During our previous experiments, the global representations obtained by sum pooling and max pooling upon convolutional local descriptors outperform those generated by FV(Fisher Vector) and VLAD(Vector of Locally Aggregated Descriptors). We propose a simple but effective method to address this issue, namely injecting spatial weights to the Fisher Vector encoding in order to highlight the foreground object while suppressing background noise. We further visualize and analyse the distribution of the spatial weight map for all the dataset images, and propose truncated spatial weighted fisher vector to improve the retrieval performance according to the consistent pattern of the spatial weight map. 2. A method based on combining object detection and probabilistic proposal aggregation has been proposed. Scene logo retrieval is a special task in the field of object retrieval. The main difficulty lies in the small proportion of the foreground object. Thus, making the global representation and retrieval methods challenging. We propose an image representation method which combines object detection with logo retrieval. Starting with generating object proposals based on object detection framework, meanwhile, we propose the concept of probabilistic proposal. And then, obtaining the global representation by two-stage aggregation: proposal level and whole image level. We are the first method to combine detection framework with logo retrieval. By inject the scores from the classifier of object detection model into the whole image level aggregation, the final global representation balances the recall and precision of the logo proposal. The results on public benchmark dataset also demonstrate our best performance. 3. An end-to-end image representation learning method based on attention mechanism has been proposed. The global representation mainly focus on the foreground object by combining the object detection model with retrieval tasks. However, the detection model needs to train extra object detector and obtain large number of box annotations. We propose a method to learn the detector and image global representation in the same time with only the image class label. We design a location sub-network to detector the foreground object and its salient parts based on the learned attention score map, and then adopt the ranking loss to learn the feature embedding. The results on public benchmark datasets show that our method can achieve comparable performance with only four proposals towards dozens or hundreds proposals utilized by other methods. Moreover, our method does not need any box annotation. 4. Trademark infringement retrieval system. Based on above methods, We design and implement trademark infringement retrieval system to provide stable and convenient trademark retrieval service for users. The methods proposed in this thesis are applied on the system establishment and outside service. And then, we implement the trademark retrieval module efficiently and provide powerful technical support for the judgement of trademark infringement, trademark design, trademark trade service.
关键词	目标检索
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/21039
专题	毕业生_博士学位论文
作者单位	中科院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	祁成祚. 图像中目标检索方法研究与应用[D]. 北京. 中国科学院研究生院,2018.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
祁成祚博士毕业论文-网页提交版-IR.p（6880KB）	学位论文		限制开放	CC BY-NC-SA