基于局部特征和语义学习的图像检索技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于局部特征和语义学习的图像检索技术研究
	张桂煊
	2017
学位类型	工学博士
英文摘要	随着计算机技术、多媒体技术和互联网技术的快速发展，人们可以更加便捷地制作和传播图像等多媒体数据，网络上的图像数据也因此呈现出爆炸式的增长。为了便于人们从海量的图像资源中快速找到感兴趣的图像，图像检索技术应运而生。基于视觉内容的图像检索指的是，根据查询图像的视觉特征，从大规模数据库中找到与之内容相关的图像，并依据与查询图像之间的特征相似度，对检索到的图像进行排序。由于图像的多样性和复杂性，实现准确而高效的图像检索是一个非常有挑战性的工作。对图像检索技术的研究具有重要的理论意义和实际应用价值。局部特征是图像检索领域最常用的视觉特征。基于局部特征的图像检索方法可以分为两类：一是将局部特征聚合成单一的全局图像特征向量，通过图像特征向量之间的相似性来进行图像检索；二是通过图像之间局部特征的匹配来描述图像相似性，并以此进行图像检索。基于局部特征聚合的图像特征向量通常与压缩方法相结合，将图像特征表示成二进制串，以实现便捷存储和高效检索。考虑到不同应用环境对存储空间的要求不同，大小可伸缩的图像特征具有更好的环境适应能力，然而现有压缩方法很少考虑可伸缩性。基于局部特征匹配的图像检索方法，能够捕捉局部视觉细节的相似性，对图像中常见的遮挡、背景干扰有一定的鲁棒性。然而由于低层局部特征与高层语义之间存在着语义鸿沟，局部特征匹配过程无法避免大量的错误匹配。在局部特征的基础上融合语义特征，能够有效削弱语义鸿沟的影响。然而如何通过学习来获取易于表达、易于存储、可高效匹配的语义特征，也是一个重要的问题。本文针对这些问题，深入研究了基于局部特征和语义学习的图像检索技术，显著提高了图像检索准确度，并将相关成果应用到图像作品的版权保护中。本文的主要贡献和创新点如下： 1、提出了一种基于Fisher向量的可伸缩图像特征压缩方法 Fisher向量是常用的基于局部特征聚合的全局图像特征向量。本文以Fisher向量为基础，研究了可伸缩的图像特征压缩方法。该方法考虑的场景为，不同的应用环境对特征的存储空间有不同的要求，压缩后的特征大小若高于此要求，会使特征无法存储；若低于此要求，会造成存储资源的闲置。可伸缩的特征压缩方法能根据需求自适应调整图像特征的大小。本文方法以基于高斯模型的Fisher子向量为单位，并以高斯模型中最大软量化概率作为线索，通过对Fisher子向量的筛选，实现特征大小的可伸缩性，以此提高图像特征在不同环境下的适应能力。 2、提出了一种局部特征匹配与语义验证相结合的图像检索方法基于局部特征匹配的图像检索方法中，局部特征的匹配精度直接影响到图像检索的准确度。由于局部特征作为一种低层视觉特征与高层语义概念之间存在着语义鸿沟，传统的局部特征匹配方法存在大量的错误匹配。针对此问题，本文提出了基于语义特征验证的局部特征匹配方法，以此提高局部特征匹配精度。本文在语义特征提取方式、用于局部特征匹配验证的语义线索选取方式、局部特征匹配验证函数、高效索引结构等方面提出了一系列创新方法。在此基础上，本文提出了局部特征匹配与语义验证相结合的图像检索方法。实验表明，该方法能够显著提高图像检索的准确度。 3、提出了一种基于卷积神经网络多特征哈希的二进制语义特征学习方法为了增强用于局部特征匹配验证的语义特征的表达能力，进一步提高语义特征验证方式下的局部特征匹配精度，本文提出了基于卷积神经网络多特征哈希的二进制语义特征学习方法。该方法将卷积神经网络中多层的语义特征学习和高维语义特征所对应的哈希函数学习串接在一起，形成一种端到端的学习模式。在指定的图像检索数据集上学习完毕后，该卷积神经网络可以将图像直接转换成融合了多层语义特性的二进制语义特征，能够实现便捷的存储和高效的匹配。实验表明，将该方法得到的二进制语义特征应用到局部特征匹配与语义验证相结合的图像检索方法中，能够进一步提高图像检索准确度。 4、数字内容版权管理与服务平台参与搭建数字内容版权管理与服务平台，将本文提出的特征提取方法和基于特征的图像检索方法成功应用于该平台的搭建和对外提供服务过程中，为用户提供了稳定、便捷的特征提取方式，为平台提供了高效的特征检索方式以及基于特征匹配的版权侵权监测方式。有效实现了该平台的图像内容版权登记与侵权监测功能，为该平台实现用户数字内容的版权登记、查询、验证、监测等服务提供了强大的技术支撑作用。; With the rapid development of computer science, digital media techniques and Internet, it is easy to produce and transmit digital media contents such as images and videos. It has witnessed the explosive growth of image data available on the Web. The techniques of image retrieval have been researched to find relevant images efficiently among the massive images. For content based image retrieval, given a query image, it needs to return a set of relevant images from the large scale database according to the query’s visual feature. Due to the diversity and complexity of images, it is a challenging work to search images accurately and efficiently. The study of this subject has very important theoretical and application value. Local visual features are the mostly used features for image retrieval. Image retrieval methods based on local visual features can be divided into two categories. The first is based on aggregation methods which aggregate local visual features into a single vector representation for an image. The second is based on matching local visual features. The vector representation of an image is often compressed into a binary code so that it can be efficiently saved and indexed. Since different environments have different memory cost limitation, the binary code should be able to adjust its size adaptively, which is called as scalability. However, few compression methods consider the scalability. The image retrieval methods based on matching local visual features can capture the similarity of local vision details, so these methods are robust to image occlusions and background clutter. However, there are many false matches in these methods due to the semantic gap between low-level visual features and high-level semantic concepts. Fusing with semantic features is able to improve the matching precision of local visual features. However, it is a key problem to learn discriminative semantic features which have low memory usage and are efficient to compute the similarity. To address the above problems, we research the techniques of image retrieval based on local visual features and semantic features learning. The main contributions are as follows: 1. A size scalable feature compression method based on Fisher vector has been proposed Fisher vector is widely used for image retrieval. It is a vector representation by aggregating local visual features of an image into a single vector. A size scalable feature compression method based on Fisher vector has been proposed. The size scalability means that the binary code after compression is able to adaptively adjust its size according to the limitation of memory cost in a given application. achieve the scalability by filtrating Fisher sub-vectors based on the maximum probability values of Gaussion models. With the scalability, the compressed Fisher vector can be applied in different environments where the memory cost limitation may change. 2. An image retrieval method based on local feature matching and semantic verification has been proposed The matching precision of local visual features has a great impact on the image retrieval accuracy in local feature matching based image retrieval methods. Due to the semantic gap, there are a lot of false matches in traditional methods. To address the problem, the semantic features are employed to verify the matching of local visual features, which helps to improve the matching precision. A series of methods, including semantic feature extraction, the way to select appropriate semantic evidence which is used to verify local feature matching, local feature matching verification function, efficient indexing structure, have been proposed. Based on these methods, an image retrieval method which combines local feature matching and semantic feature verification is proposed. Comprehensive experiments show that this method significantly improves the image retrieval accuracy. 3. A CNN based multiple-feature hashing method for learning binary semantic features has been proposed In order to make the semantic features more robust and discriminative, so that the matching precision of local visual features after verification can be further improved, a CNN based multiple-feature hashing method for learnig binary semantic features has been proposed. This method concatenates the multiple semantic features learning part and hashing learning part, which results in an end-to-end deep learning architecture. An image can be transformed to a binary semantic code directly if the network has been trained on datasets which are used for image retrieval. Since the binary semantic code involves multiple-level semantic information, it helps to further improve the matching precision of local visual features. Extensive experiments demonstrate that the binary semantic code achieved by this method can further improve the image retrieval accuracy. 4. The digital copyright management and service system has been established The feature extraction method and the visual feature based image retrieval method proposed in this thesis have been successfully applied for establishing the digital copyright management and service system, which helps to protect the lawful rights of copyright owners. The image retrieval techniques proposed in this thesis help to implement the image copyright registration and infringement detection for the digital copyright management and service system.
关键词	图像检索可伸缩特征压缩局部特征语义特征学习图像版权保护
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14789
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	张桂煊. 基于局部特征和语义学习的图像检索技术研究[D]. 北京. 中国科学院大学,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于局部特征和语义学习的图像检索技术研究（12066KB）	学位论文		限制开放	CC BY-NC-SA