CASIA OpenIR  > 毕业生  > 博士学位论文
基于深度哈希学习的视觉检索研究
王运波
2019-12
页数116
学位类型博士
中文摘要

随着互联网的快速发展和智能终端设备的普及,多媒体数据(图片、视频、文档...)呈现爆炸式的增长,我们进入了大数据时代。据调查显示,Facebook拥有10亿以上的用户,每天上传超过3.5亿张图片。新浪微博用户数超过5亿,每天产生的微博数超过1亿条。面对规模庞大的数据,如何高效地检索目标数据对大规模数据的管理和使用有着重大意义。最近邻的数据检索方法,由于数据维度高和数据规模大,导致查询速度慢、存储代价高。另外,基于传统哈希编码的近似最近邻检索技术通过构造哈希函数将高维特征映射为低维的二值码,减少了存储空间和提高了查询速度,但是其事先需要进行手工特征提取及存储,对大规模数据而言工作量大。
近年来深度神经网络在特征提取领域取得了重大突破,学者们将深度神经网络和哈希学习技术结合起来,提出了深度哈希学习,实现了同步进行特征表示学习和哈希学习,在实际应用中,取得了更好的检索效果。
现有的深度哈希学习主要根据数据对离散的语义相似性(1或-1)去保持数据对的相似性或相对相似性,忽视了数据对之间的高层语义特性以及数据对之间的局部语义相似性关系。
针对前人研究存在的问题,本文主要研究基于深度哈希学习的视觉检索任务,提出了一系列深度哈希学习算法。并在多个大尺度图片数据集上进行了丰富的评测验证。具体而言,本文取得的主要研究成果如下:

1. 为了有效保持数据对在汉明空间的语义相似性,本文提出了基于数据对相似度敏感机理的深度哈希方法,使得语义相似的数据对在汉明空间距离小,不相似的数据对在汉明空间距离大。对于以数据对作为输入的深度哈希技术,某些数据对无法有效保持相似性,本文对训练数据对构建一个相似度保持难易程度的先验信息,然后采用权重自适应的交叉熵损失函数学习数据对之间的相似性,生成鲁棒的二值码。在量化阶段,为了保持已学好的成对相似性,本文提出了基于拉普拉斯分布的相似性保持模型,尽可能保持已学好的成对相似性。几个标准数据库上的实验结果验证了基于数据对相似度敏感机理的深度哈希算法的有效性。

2. 为了利用数据对中的局部语义相似性关系,本文提出了基于局部语义感知的深度哈希方法,保持数据对在汉明空间的局部语义相似性。由于逐对的哈希学习获取数据对的相似性或基于三元组排序的哈希学习获取数据对的相对相似性,无法展示数据对之间的局域语义特性。本文在三元组基础上,构建一个四元组作为输入,利用潜在的局部语义关系,有效地保持数据对在汉明空间中的局部语义相似性。考虑到量化误差会削弱数据对的相似性,本文提出了汉明等价映射的量化约束,保持二值码对和相应实值对相似性的一致性。实验结果表明,基于局部语义感知的深度哈希方法可有效生成紧凑的二值码,提高了检索性能。

3. 为了有效挖掘数据对之间高层相似性关系,本文提出了基于语义重定义的深度重构哈希算法,保持数据对之间的高层语义特性。由于现有的方法简单地构建数据对之间的相似性关系,样本对之间的细粒度信息无法展现以及在汉明空间生成不兼容的二值码,本文重新定义了数据对之间的相似性关系,在相似数据对中我们考虑两个数据的细粒度语义特性,在不相似性数据对中我们挖掘与其它数据对的兼容情况,然后在汉明空间重建出具有这种高层语义特性的二值码。
由于二值码离散不可导,我们对离散变量进行松弛,同时采用了基于指数分成对布语义保持模型去维持已学好的语义特性。实验结果证明了基于语义重定义的深度重构哈希的有效性,提高了大尺度数据的检索性能。

英文摘要

With the rapid development of the Internet and the popularity of intelligent terminal devices, multimedia data(like pictures, videos, documents...) shows an explosive growth, and we enter the era of big data. According to the survey, Facebook has more than 1 billion users, and they share more than 350 million pictures everyday. Sina-Weibo has more than 500 million users and produces more than 100 million blogs everyday. Facing such large-scale data, how to retrieve efficiently the target item is of great significance to the management and usage of large-scale data.
Due to the large volume and high-dimension of data, the nearest neighbor search(NNS) suffers from low query speed and high storage cost. In addition, traditional hash coding based approximate NNS maps high-dimension feature to low-dimension binary codes by hash functions, reducing the storage space and improving the query speed, but it depends too much on manual feature and results in a large workload for large-scale data. Recently, with deep neural network having made a great breakthrough in the field of feature extraction, researchers propose a series of deep hashing methods by combining deep neural network with hash learning, which can realize simultaneously feature representation learning and hash learning. What's more, it obtains a better retrieval result in a real-world application. The existing deep hashing mainly preserves the similarity or relative similarity of data pairs according to the discrete semantic similarity (1 or -1), neglecting the high-level semantic characteristic and the local semantic similarity relationship between data pairs. In view of these problems, this paper mainly studies the visual retrieval based on deep hashing technique, and proposes a series of deep hash learning algorithms. Besides, Comprehensive evaluations and verifications are carried on several large-scale image datasets. Specifically, the main research results of this paper are listed as following:

1. In order to effectively preserving the semantic similarity of data pair in Hamming space, this paper proposes a novel deep hashing based on data pair' similarity sensitive mechanism, making similar data pair have a small distance.
For deep hashing with data pair as input, some data pairs cannot effectively maintain similarity in Hamming space.
In this paper, a prior knowledge based on the degree of preserving-similarity of data pair is constructed, then it uses a weighted adaptive cross-entropy loss to learn data pairs similarity, generating the robust binary code.
In the quantization phase, in order to maintain the well-learned paired similarity, this paper adopts a similarity preservation model based on Laplacian distribution, maintaining the paired similarity as far as possible. Experimental results on several standard databases verify the effectiveness of the proposed method.

2. In order to exploit the local semantic similarity among data, this paper proposes a local semantics-awareness deep hashing method to maintain the local semantic similarity in Hamming space. Since the pairwise hash learning obtains the similarity of data pair or the triple-based hash learning obtains the relative similarity between data pairs, the potential local semantic characteristic among multiple data pairs is overlooked.
This paper constructs a quadruplet as the input, and explore the potential semantic relationship to effectively maintain the local semantic similarity among data.
Considering that the quantization error introduced by binarization will weaken the paired similarity, it further proposes a Hamming-isometric quantization constraint to maintain the consistency of similarity between paired binary codes and the corresponding real-value pairs.
The experimental results show that the local semantics-awareness deep hash can effectively generate compact binary codes and improve retrieval performance.

3. In order to explore the high-level semantic similarity relationship of data pair, this paper proposes a deep reconstruction hash method based on semantic relationship redefinition, preserving the high-level semantic characteristic of data pair. Since the existing method simply constructs the similarity relationship, the fine-grained similarity of data pair cannot be displayed as well as some binary codes can not be compatible in Hamming space.
This paper redefines the semantic similarity relationship of data pair to encourage similarity retrieval. In similar data pairs we consider the fine-grained semantic properties, and we consider their compatibility with other data pairs for dissimilar data pairs. Then the binary codes with such high-level semantic property is reconstructed in Hamming space.
Since the discrete binary code is not derivable, we relax the discrete variable and use the exponential distribution based preserving-semantics model to main the high-level semantics. The experimental results verify the effectiveness of the proposed method, and improve the retrieval performance on several large-scale image datasets.

关键词哈希学习,视觉检索
语种中文
七大方向——子方向分类图像视频处理与分析
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/28387
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
王运波. 基于深度哈希学习的视觉检索研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis.pdf(6243KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王运波]的文章
百度学术
百度学术中相似的文章
[王运波]的文章
必应学术
必应学术中相似的文章
[王运波]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。