CASIA OpenIR  > 毕业生  > 博士学位论文
基于量化学习的大规模图像检索方法研究
吴家祥
学位类型工学博士
导师马颂德
2017-05
学位授予单位中国科学院研究生院
学位授予地点北京
关键词图像检索 特征提取与索引 哈希 乘积量化 神经网络
其他摘要

近年来,随着互联网技术的快速发展,网络数据呈现出爆炸式增长的趋势。其中,图像数据由于其丰富的视觉语义信息,成为众多实际应用中的主要研究对象。对于海量的图像数据,如何快速有效地从中检索出与用户查询最为相关的图像,是大规模图像检索系统中亟需解决的问题,也是学术界与工业界共同关注的研究热点。

对于大规模图像检索系统,其核心任务在于高效地对图像提取特征,并构建相应的索引结构,以加快检索速度。基于人工设计的图像特征不能根据不同的应用场景进行相应地调整,表征能力有限;基于深度学习的图像特征虽然可以更好地针对检索任务进行优化,但是特征提取过程的计算效率较低。在图像特征数据库的索引方面,基于树型结构的索引方法在图像特征维度较高时性能退化严重,而基于量化学习的索引方法,包括基于二值编码的量化(即哈希)和基于k值编码的量化(例如乘积量化),可以有效地降低检索过程中的计算与存储开销。目前,此类方法中仍有一些重要的问题有待探讨,包括大规模数据集上的训练效率、分布式环境下的训练方法以及哈希函数具体形式的设计等。针对上述问题,本文展开了深入研究,主要的研究成果及创新点如下:

1. 针对乘积量化方法在大规模数据集上训练效率低下的问题,本文提出了一种基于核心集的乘积量化方法。以乘积量化为代表的一系列方法,虽然检索精度较高,但是当数据规模较大时,其训练过程中的时间与内存开销是十分高昂的。本文通过构建一个紧致而有代表性的核心集,并基于该核心集对乘积量化的参数进行优化,可以显著地降低训练阶段的计算开销,并保持检索精度几乎不变。此外,本文对核心集构建过程中的投影矩阵进行优化,解决了核心集对高维数据近似效果不佳的问题。

2. 针对哈希方法难以在分布式环境下高效训练的问题,本文提出了一种基于分布式学习的哈希方法。数据相关的哈希方法大多假设全部的训练数据均存储于单个计算节点上,但在实际应用场景中,数据的采集与存储往往是在分布式网络中各个节点上同时进行的。本文提出了一种高效的分布式优化算法,可以直接基于存储在各个节点上的数据对哈希函数进行学习。首先,本文将哈希函数的学习,建模为全局字典矩阵的优化问题,以最小化训练数据的量化误差。之后,通过引入一致性约束,该优化问题被分解为多个子问题,可在分布式环境下并行地进行求解。

3. 针对基于双区间量化的哈希方法难以保持样本间近邻关系的问题,本文提出了一种基于多区间量化的哈希方法。哈希函数的常用形式是首先对样本特征进行线性投影,然后将投影后的每个维度划分为两个区间,分别量化为-1和1。考虑到二值编码的平衡性约束,基于双区间的量化方式会导致在数据分布稠密区域发生二值编码的突变,这不利于样本间近邻关系的保持。针对这一问题,本文提出了两种改进的量化方法,即三区间量化和无穷区间量化。通过对投影后的数据进行变换,可以在满足平衡性约束的同时,避开数据分布的稠密区域进行二值化操作,从而有效提升了哈希方法的检索精度。

4. 针对基于深度学习的图像特征计算效率较低的问题,本文提出了一种基于量化的卷积神经网络加速与压缩方法。卷积神经网络可以提取更有表征能力的图像特征,但是特征提取过程中过高的计算开销,限制了其在计算资源相对有限的移动设备平台上的应用。本文通过对卷积神经网络中卷积层和全连接层的参数进行量化,可以有效地对网络模型进行加速与压缩,同时保持识别精度基本不变。本文通过最小化各层输出值的近似误差,对网络参数的量化结果进行优化,并基于逐层量化的训练方式,有效地抑制了量化后网络模型中的累积误差问题。

;

With the rapid development of Internet technology, we have witnessed an explosive growth of web data. Particularly, image data, due to its rich visual semantic information, has become one of the major research subjects in numerous real-world applications. Large-scale image retrieval, which aims at efficiently retrieving the most relevant item(s) to the user's query from the massive image database, is a challenging task and has been extensively studied in both academia and industry.

For a large-scale image retrieval system, its core objective is to efficiently extract visual features, and then construct a database index for fast retrieval. Hand-crafted visual features cannot automatically adapt to various applications, which partially limits the representative ability. Deep learning based features can be optimized towards the retrieval task, but the prohibitive computational overhead is cumbersome for large-scale applications. As for the indexing of visual features, tree-based indexing methods deteriorate severely in case of the high-dimensional data. Learning-to-quantize based methods, including binary-valued quantization (a.k.a. hashing) and k-valued quantization (e.g. product quantization), can dramatically reduce the time and storage consumption of the retrieval process. Still, there are several important issues that are rarely discussed, such as training efficiency on large-scale data sets, learning algorithm under the distributed setting, and designing of hashing functions' general forms. This dissertation concentrates on these problems, and conducts extensive research to tackle them. The main contributions are summarized as follows:

1. This dissertation proposes a core-set based product quantization method to address the training efficiency issue on large-scale data sets. Although product quantization and its later extensions have achieved satisfactory retrieval accuracy, the time and memory consumption of their training process are prohibitive in large-scale scenarios. In this dissertation, a compact yet representative core-set is constructed to replace the original data set during the training stage. This method can significantly reduce the computational overhead, while maintaining the retrieval accuracy almost unaffected. In addition, a projection matrix is optimized in the construction of the core-set, so as to better approximate the high-dimensional data.

2. This dissertation proposes a distributed learning algorithm to address the training issue of hashing methods under the distributed setting. Most data-dependent hashing methods assume that the training data is centrally stored on a single node. However, in many real-world applications, the data is collected and stored on multiple nodes within a distributed network. In this dissertation, a efficient learning algorithm is presented to train universal hashing functions directly from the distributed data. Specifically, the learning process is formulated as the optimization over a global dictionary matrix, to minimize the quantization error of all the training data. By introducing a group of consistency constraints, this optimization is decomposed into multiple sub-problems, which can be solved in a distributed manner via the alternating direction method of multipliers.

3. This dissertation proposes a hashing method with multi-interval quantization to address the neighborhood structure violating issue of dual-interval quantization. In the commonly used dual-interval quantization, the database is firstly linearly projected into a low-dimensional space, of which each dimension is split into two intervals and then quantized as -1 and 1, respectively. Due to the balance constraint of binary-valued encoding, such quantization will cause the encoding value to change in the densest area of the data distribution, which conflicts with the preserving of neighborhood relationships. In this dissertation, two improved quantization schemes, i.e. triple-interval quantization and infinite-interval quantization, are presented to overcome this problem. After imposing a non-linear transformation on the projected data, the quantization operation can be performed outside the densest area of the data distribution, while still preserving the balance constraint.

4. This dissertation proposes a quantization-based acceleration and compression method for convolutional neural networks (CNNs), to address the computational efficiency issue of deep learning based visual feature extraction. Despite the impressive representative ability of CNN-based visual features, the prohibitive computational overhead of feature extraction prevents their further applications, e.g. on mobile devices. In this dissertation, network parameters in convolutional and fully-connected layers are quantized, to fulfill simultaneous acceleration and compression with negligible accuracy loss. The quantized parameters are optimized to minimize the approximation error of each layer's output values. A layer-by-layer training scheme is adopted to suppress the accumulative error in the quantized network, thus better approximating the original one.

文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/14968
专题毕业生_博士学位论文
作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
吴家祥. 基于量化学习的大规模图像检索方法研究[D]. 北京. 中国科学院研究生院,2017.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis_wujiaxiang.pd(2947KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[吴家祥]的文章
百度学术
百度学术中相似的文章
[吴家祥]的文章
必应学术
必应学术中相似的文章
[吴家祥]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。