CASIA OpenIR  > 毕业生  > 博士学位论文





1. 提出了一种结合空间权重的Fisher向量图像表示方法


卷积神经网络(Convolutional Neural Network, CNN)在图像特征表示中强大的鉴别性,使得其在图像检索任务中至关重要。其中卷积层的特征表达能力优于全连接层特征,但是卷积层的特征依赖编码和聚合方法才能得到图片的整体表达。本文在前期实验中发现卷积神经网络的卷积层局部描述子在类似Fisher向量编码,局部聚合向量编码(VLAD)等基于码本的高维聚合方法中表现还不如简单的均值池化或者最大值池化等方法。针对这种问题,提出了一种简单而有效的方法,在这类高维编码方法的聚合方式中引入空间权重信息,这样可以在图像整体表示中突出前景物体,抑制背景噪声,解决了此类高维编码方法在卷积神经网络的卷积局部描述子上效果不好的问题。同时进一步的可视化并分析了数据库中所有图片的空间权重图的整体分布,结合空间权重图分布的统一规律提出了截断的权重聚合方法,进一步的提升了检索性能。


2. 提出了一种结合检测的概率候选框特征聚合方法




3. 提出了一种基于注意力机制的端对端图像表示学习方法




4. 图形商标侵权检索系统




With the rapid development of computer technology, multi-media and internet, the number of vivid, expressive and informative web images is experiencing explosive growth. Image retrieval technique is developed to make it convenient for users to find the images they need. Object retrieval aims at searching for the images which contain the same object of the query in large scale web images. Due to the complexity and diversity of object in image, fast and accurate object retrieval is challenging. Therefore, research on object retrieval has meaning both in theory and application.


The special part of object retrieval lies in that the foreground object may only occupies a small part of the whole image. Therefore, removing background noise and detecting small foreground objects are the difficulties of object retrieval field. Meanwhile, in the context of having access to foreground objcet, traditional hand-crafted local descriptors lack discrimination power. How to utilize the supervised information to generate more powerful semantic feature remains an important issue. We dig deeply into the image retrieval methods based on object representation learning and improve the retrieval performance. What's more, we adopt our improvement to solve trademark image infringement detection. The main contributions and innovations of this thesis are as follows:


1. A spatial weighted Fisher Vector based image representation method has been proposed.


Due to its discrimination power, CNN(Convolutional Neural Network) based image representations play key role in object retrieval. Features generated by convolutional layers are better than those from FC(Fully Connected) layers. During our previous experiments, the global representations obtained by sum pooling and max pooling upon convolutional local descriptors outperform those generated by FV(Fisher Vector) and VLAD(Vector of Locally Aggregated Descriptors). We propose a simple but effective method to address this issue, namely injecting spatial weights to the Fisher Vector encoding in order to highlight the foreground object while suppressing background noise. We further visualize and analyse the distribution of the spatial weight map for all the dataset images, and propose truncated spatial weighted fisher vector to improve the retrieval performance according to the consistent pattern of the spatial weight map.


2. A method based on combining object detection and probabilistic proposal aggregation has been proposed. 


Scene logo retrieval is a special task in the field of object retrieval. The main difficulty lies in the small proportion of the foreground object. Thus, making the global representation and retrieval methods challenging. We propose an image representation method which combines object detection with logo retrieval. Starting with generating object proposals based on object detection framework, meanwhile, we propose the concept of probabilistic proposal. And then, obtaining the global representation by two-stage aggregation: proposal level and whole image level. We are the first method to combine detection framework with logo retrieval. By inject the scores from the classifier of object detection model into the whole image level aggregation, the final global representation balances the recall and precision of the logo proposal. The results on public benchmark dataset also demonstrate our best performance.


3. An end-to-end image representation learning method based on attention mechanism has been proposed.


The global representation mainly focus on the foreground object by combining the object detection model with retrieval tasks. However, the detection model needs to train extra object detector and obtain large number of box annotations.

We propose a method to learn the detector and image global representation in the same time with only the image class label. We design a location sub-network to detector the foreground object and its salient parts based on the learned attention score map, and then adopt the ranking loss to learn the feature embedding.

The results on public benchmark datasets show that our method can achieve comparable performance with only four proposals towards dozens or hundreds proposals utilized by other methods. Moreover, our method does not need any box annotation.


4. Trademark infringement retrieval system.


Based on above methods, We design and implement trademark infringement retrieval system to provide stable and convenient trademark retrieval service for users. The methods proposed in this thesis are applied on the system establishment and outside service. And then, we implement the trademark retrieval module efficiently and provide powerful technical support for the judgement of trademark infringement, trademark design, trademark trade service.

GB/T 7714
祁成祚. 图像中目标检索方法研究与应用[D]. 北京. 中国科学院研究生院,2018.
文件名称/大小 文献类型 版本类型 开放类型 使用许可
祁成祚博士毕业论文-网页提交版-IR.p(6880KB)学位论文 限制开放CC BY-NC-SA
所有评论 (0)
