基于卷积神经网络的图像检索特征学习方法研究

CASIA OpenIR > 复杂系统管理与控制国家重点实验室 > 影像分析与机器视觉

	基于卷积神经网络的图像检索特征学习方法研究
	徐健
	2020-05-29
页数	124
学位类型	博士
中文摘要	近年来，随着智能移动设备的普及和人工智能技术的飞速发展，图像数据的数量呈指数级爆炸式地增长。如何使用图像检索技术，基于查询图像在大规模的数据库中快速、精确地检索相关的图像，对海量的图像数据进行有效的管理、分析和使用，在大数据和人工智能时代具有重要的理论意义和应用价值，也是学术界与工业界共同关注的研究热点。图像的特征表示作为图像检索任务中的关键环节，对于图像检索的性能起着至关重要的作用，所以图像检索中的特征学习方法值得我们进行深入的探索和研究。尤其是在无标注数据或仅有少量标注数据情况下，针对复杂背景的干扰、小目标检索、遮挡和局部相似图像检索等图像检索难题，如何基于图像检索特征学习方法提取具有鉴别性的图像特征表示，更加值得我们进行深入的研究和攻克。本论文的主要贡献和创新点归纳如下： 1. 提出了一种无监督的基于语义的特征聚合方法卷积神经网络具有较强的语义表示能力，深层卷积层的特征图中某些通道会对特定的语义内容进行显著的响应。然而现有的特征学习方法大都忽略了这些局部语义信息，造成图像的全局特征表示中丢失了大量鉴别性的细节信息。所以基于卷积神经网络较强的语义表示能力，本文提出了一种无监督的基于语义的特征聚合方法，结合局部语义信息提取了具有鉴别性的特征表示，并且在一定程度上解决了复杂背景的干扰。本方法使用无监督的策略挑选对应特定局部语义内容的语义检测器，用于生成语义候选区域。基于语义候选区域进行特征图的加权聚合，突出了特征表示中鉴别性的局部语义信息，抑制了背景噪声的干扰，因此有效地提升了检索性能。本方法在5个标准的图像检索公开数据集上进行了实验，结果证明本方法具有良好的检索性能。值得注意的是，本方法属于无监督的方法，尤其适用于训练数据较难收集的情况。 2. 提出了一种端到端的基于对抗注意力机制的特征聚合方法无监督的策略只能挖掘出有限的语义信息。在有少量的训练数据的情况下，基于卷积神经网络较强的语义学习能力，本文提出了一种端到端的基于对抗注意力机制的特征聚合方法。本方法能够有效地捕获针对性的局部语义信息并且突出主要的前景目标，所以在一定程度上解决了小目标检索的难题。在小目标检索任务中，目标仅仅占据图像中很小的一部分。本方法基于注意力机制学习得到语义检测器，捕获关键目标中可区分性较强的局部语义信息，提升了特征表示的针对性和鉴别性。另外，为了解决多个语义检测器中的模式重复问题，本方法还使用了对抗擦除的策略，捕获多种不同的局部语义信息，进一步提升了特征表示的检索性能。实验结果表明本文提出的端到端的基于对抗注意力机制的特征聚合方法能够有针对性地捕获前景目标中关键的语义信息，有效地提升了检索效果。 3. 提出了一种无监督的迭代流形嵌入层方法特征表示在检索过程中需要基于距离进行相似性度量，其计算复杂度与特征表示的维度相关。所以在实际应用中通常需要对特征表示进行降维，但是降维也会导致特征表示中具有鉴别性的信息的丢失。基于流形学习使用测地线距离进行相似性度量，可以有效地保留局部语义信息，并且能够在一定程度上解决遮挡和局部相似图像检索的问题。所以本文提出了一种无监督的迭代流形嵌入层方法，可以作为全连接层接入卷积神经网络之中，对特征表示进行降维并且尽量保持鉴别性不丢失。本方法基于二阶近邻信息和欧氏距离对测地线距离进行了修正，更好地重建了流形空间，尽量保留了降维后的特征表示的鉴别性。另外，本方法基于岭回归算法简化和集成了原始特征表示到迭代流形嵌入层特征表示的映射，减小了测地线距离的估计误差和降维操作的计算复杂度，进一步提升了检索性能和效率。实验结果表明本文提出的迭代流形嵌入层方法有效地提升了特征表示的鉴别性和检索的效率。 4. 进行了图像检索相关应用的实例分析基于以上提出的图像检索特征学习方法，结合实际应用场景，本文将上述方法应用到了侵权商标图像检索系统之中，在一定程度上解决了局部相似和语义相似的侵权商标图像的检索问题，并且通过特征降维有效地提升了检索效率。侵权商标图像检索系统实现了相似商标图像的检索，为商标申请人员和审查人员提供了便利，能够提高商标注册和审核过程的效率。
英文摘要	In recent years, with the popularization of smart mobile devices and the rapid development of artificial intelligence technology, the amount of image data has grown explosively. How to use image retrieval technology to retrieve relevant images quickly and accurately in large-scale database based on query images, and manage, analyze and use massive image data effectively, has important theoretical significance and application value in the era of big data and artificial intelligence, and is also a research focus of academia and industry. As the main part of image retrieval, image feature representation plays an important role in the performance of image retrieval. Therefore, the research of feature learning in image retrieval is of great significance. Especially in the case of with no or only a small amount of annotated data, how to extract discriminative feature representation based on feature learning method for image retrieval problems such as interference of complex background, small object retrieval, occlusion, and partially similar image retrieval, etc., is more worthy of in-depth research. The main contributions and innovations of this dissertation are summarized as follows: 1. An unsupervised semantic-based aggregation method The convolutional neural network has a strong semantic representation ability, and some channels in the feature map of the deep convolutional layer respond significantly to specific semantic content. Most of the previous feature learning methods ignore the local semantic information, leading to the loss of a large number of discriminative details in the global feature representation of the image. Based on the strong semantic representation ability of the convolutional neural network, this dissertation proposes an unsupervised semantic-based aggregation method, which fuses the discriminative local semantic information with the feature representation and solves the interference of complex background to some extent. The unsupervised strategy is employed in this dissertation to select the semantic detector corresponding to the specific local semantic content to generate semantic proposals. The weighted aggregation of feature maps based on semantic proposals highlights the discriminative local semantic information in the feature representation and inhibits the interference of background noise. Therefore, the retrieval performance is improved. Experimental results on five standard open datasets of image retrieval show that the retrieval performance of this method is good. It is worth noting that this method is unsupervised and is especially suitable for situations where training data are difficult to collect. 2. An end-to-end adversarial attention aggregation method Unsupervised strategy only digs out limited semantic information. With a small amount of training data, we propose an end-to-end adversarial attention aggregation method in this dissertation, based on the strong semantic learning ability of the convolutional neural network. This method can effectively capture the targeted local semantic information and highlight the main foreground object. Therefore, it can solve the problem of small object retrieval to some extent. In the small object retrieval task, the target only occupies a small part of the image. This method learns the semantic detectors based on attention mechanisms to capture the local semantic information with strong discrimination in the key targets and improves the pertinence and discrimination of feature representation. Besides, in order to solve the problem of pattern repetition in multiple semantic detectors, this method also employs an adversarial erasing strategy that captures a variety of local semantic information and further improves the performance. The experimental results show that the end-to-end adversarial attention aggregation method proposed in this dissertation can capture the pivotal semantic information in the foreground target and effectively improve the retrieval effect. 3. An unsupervised iterative manifold embedding layer method The measurement of similarity of image representations is computed in the retrieval process, and its computational complexity is related to the dimension of feature representation. Therefore, it is necessary to reduce the dimension of feature representation in practical applications, but the dimension reduction will lead to the loss of discrimination. Employing geodesic distance to measure similarity based on manifold learning can effectively retain local semantic information and solve the problems of occlusion and partially similar image retrieval to some extent. Therefore, this dissertation proposes an unsupervised iterative manifold embedding layer method to reduce the dimension of representation and preserve the discriminability as much as possible, which can be connected to the convolutional neural network as a full connection layer. Based on exploring the information of second-order proximity and Euclidean distance, this method modifies geodesic distance to reconstructs the manifold space better and retains the discriminability of feature representation after dimension reduction. Besides, this method simplifies and integrates the mapping from the original feature representation to the iterative manifold embedded layer representation based on the ridge regression algorithm. It reduces the estimation error of geodesic distance and the computational complexity of dimension reduction operation and further improves the performance and efficiency. Experimental results show that the iterative manifold embedding layer method proposed in this dissertation improves the discrimination of feature representation and the efficiency of retrieval. 4. An application example of image retrieval Based on the above retrieval feature learning methods and combined with the actual application scenario, these methods are applied to the right-infringing trademark image retrieval system in this dissertation. To a certain extent, they solve the problem of partially similar and semantically similar infringement in trademark image retrieval. The dimension reduction method effectively improves the retrieval efficiency. The image retrieval system of the right-infringing trademark realizes the retrieval of similar trademark images. It provides convenience for trademark applicants and examiners, and can improve the efficiency of trademark registration and examination.
关键词	图像检索语义检测器对抗注意力机制流形学习侵权商标
学科领域	人工智能
学科门类	工学::计算机科学与技术（可授工学、理学学位）
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39024
专题	复杂系统管理与控制国家重点实验室_影像分析与机器视觉
通讯作者	徐健
推荐引用方式 GB/T 7714	徐健. 基于卷积神经网络的图像检索特征学习方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于卷积神经网络的图像检索特征学习方法研（6961KB）	学位论文		开放获取	CC BY-NC-SA