CASIA OpenIR  > 毕业生  > 博士学位论文
基于量化学习的图像特征表示研究
胡庆浩
Subtype博士
Thesis Advisor程健
2019-05-28
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工学博士
Degree Discipline计算机应用技术
Keyword量化学习 特征表示 哈希 深度网络量化
Abstract

    近年来随着人工智能研究的迅速发展,计算机视觉的各个研究方向都取得很大突破。在计算机视觉中,图像是最重要的信息载体,如何对图像进行特征表示一直是一个研究热点。作为图像理解和分析的核心问题,图像的特征表示在基于内容的图像检索、图像分类、人脸识别等任务中发挥着重要作用。

    虽然图像的特征表示研究在近年来取得了很大的进步,但是在服务端和边缘端(或者移动端)上仍然存在一些问题。在服务端,随着互联网图片数量的迅速增长,海量图像的特征表示将占用极大的存储空间,如何学习紧凑且有判别力的特征表示是一个值得研究的问题。此外,图像特征表示的方式直接影响到图像的检索速度。在浮点格式的特征表示下,大规模的图像相似度计算会很慢,如果使用低比特定点数的表示方式则能够实现一定程度的加速。在边缘端,基于深度网络的图像特征具有判别力强、语义性高等优点,而且在边缘端进行图像特征抽取可以保护隐私,分担服务器压力。但是边缘计算设备的计算力、存储空间、电量等计算资源都十分有限,如何对图像进行高效地深度特征抽取是一个挑战。

    量化学习是指通过学习的方式将一个大集合(通常为连续空间)内的元素映射到有限元素的小集合内。传统的量化方法有取整和截断,和传统量化方式不同的是,量化学习是在某种损失函数指导下基于数据分布学习到具体的量化方法。量化学习可以有效地减少图像特征表示大小,加快图像的检索速度,同时对图像的高效特征表示也有非常重要的作用,因此研究基于量化学习的图像特征表示具有非常重要的实际应用价值和意义。
 
    本文针对图像特征表示在服务端和边缘端存在的问题,在图像的紧凑特征表示、高效深度特征提取等方面展开研究,具体的研究内容和创新点归纳如下:
   
    1.提出了一种基于快速K均值算法的紧凑图像特征表示方法。通过对特征进行K均值聚类可以实现图像的紧凑表示,但是在大规模图像的场景下,K均值算法的速度是很难接受的。为了加快K均值算法的速度,实现快速的图像特征的紧凑表示,本文提出了一种基于快速K均值算法的紧凑图像特征表示方法,通过由粗到细的搜索策略减少搜索空间,并且在分配聚类中心阶段引入哈希算法来减少候选聚类中心的个数。实验结果表明,本文提出的多阶段K均值算法(MKM)与标准K均值算法相比,在得到近似的聚类精度的同时还能够实现约600倍的加速,能够在大规模数据下快速地将特征量化成紧凑的表示形式。
   
    2. 提出了一种基于伪标签的无监督深度哈希特征表示方法。深度哈希特征具有特征紧凑、易于检索、判别力高等优点,但是深度网络的训练往往需要大量的监督数据,而监督数据的获取代价是十分高昂的。为了解决这个问题,本文提出了一种基于伪标签的无监督深度判别哈希算法。本文从迁移学习的思想受到启发,用预训练的网络提取特征并进行聚类分析得到伪标签。利用伪标签构建分类损失,结合二值约束的量化损失来训练深度哈希网络。本文在公开数据集上进行了实验分析,其结果表明该方法的性能优于现有方法。
 
    3. 提出了一种基于哈希的高效深度特征抽取算法。在边缘端设备上,基于深度网络的特征提取存在速度慢、占用存储大、功耗高等问题,虽然二值权重网络能够实现对网络的压缩和加速,但是同时也会带来较大的精度损失。为了解决该问题,本文提出了一种基于哈希的二值权重网络,能够实现高效的深度特征抽取。本文揭示了保持内积哈希算法和二值权重网络的内在联系,发现网络二值权重的训练问题可以转化成一个哈希问题。为了减少二值量化的损失,对二值编码乘以了一个尺度因子,并且提出了一种交替优化算法来迭代更新二值权重和尺度因子。实验结果表明本文提出的基于哈希的二值权重网络超过了当前最好的方法。值得提出的是,在ResNet-18的图像分类任务中,该方法比当前最好的方法提高了三个百分点的分类精度。
    4. 提出了一种基于二值分解的高效深度特征抽取算法。虽然二值权重网络能够实现高效地深度特征抽取,但是会带来一定的精度损失。二值权重网络精度下降的一个主要原因是表达能力较低,为了解决该问题,本文提出了一种新颖的二值分解算法,通过二值分解得到的二值权重网络有着更强的网络表达能力,因而能够提高二值权重网络的精度。此外本文还提出了一种交替优化算法在保持二值约束的情况下求解二值分解因子。在AlexNet、ResNet-18和ResNet-50等网络上的实验结果均表明我们的方法远超过现有方法。本文在FPGA平台上实现了二值权重的AlexNet网络,结果表明本文提出的二值权重网络在使用更少的内存和乘法器的前提下还能实现约9倍的加速,能够实现高效的深度特征抽取。

Other Abstract

    Recent years have witnessed the rapid development of artificial intelligence, people have made big breakthroughs on many research areas of computer vision. While image is one of the most important information carriers, how to represent the image features has been a hot research topic for a long time. As the key problem for image analysis and understanding, image feature representation plays an important role in content-based image retrieval, image classification, face identification and so on.
    Although studies on image feature representation have progressed rapidly in recent years, there still exists some problems both on server side and edge side (or mobile device). For server side, since the quantity of images on the internet grows rapidly in recent years, saving large scale image features will take much storage. So how to learn compact image feature representation is a good problem to study. Besides, the specific format to represent the image will directly affect the retrieval speed. For floating format image features, calculating the similarities for large scale image set requres more time. But it will be accelerated if the images are represented by fixed-point format numbers. For edge side, deep learning based feature representation has powerful discrimination ability. More over, extracting deep features on edge devices will protect users' privacy and alleviate the request pressure for server side. Most edge devices or mobile devices have limited computation power, disk storage, and battery power. However, most deep neural networks have much high computation complexity and storage demand. As a result, it's a challenge to extract image features efficiently on edge side.
 
     Learning-to-quantize is to map values from a large set (e.g. a continuous set) to values in a smaller set(a discrete set) via learning from data. Traditional quantization methods, including rounding and truncation, get the quantization values directly while learning-to-quantize will learn the quantization value from the data guided by some loss functions. Learning-to-quantize can reduce the storage of image feature and accelerate the retrieval speed. Additionally, it can also be used to accelerate deep features extraction. So studying the  learning-to-quantize based image feature representation has very important application values. 
  
     To cure those problems mentioned above, we conduct a series of research on image feature hashing, quantized deep network based feature representation. We summarize the specific research content and contributions as follows:
 
     1. We proposed a fast k-means  based compact image feature representation method for large scale images. Clustering image features via K-means can achieve compact image feature representation. But K-means becomes quite slow for large scale clustering. To speed up K-means algorithm and represent images via compact features, we propose a fast k-means based image feature representation method, which uses a coarse-to-fine search strategy to reduce the search space. Besides, hashing algorithm is introduced to speed up the assignment step in order to reduce the candidate clusters. Experiments show that the proposed MKM algorithm can achieve up to 600 tmes speed-up over the standard $k$-means with comparable clustering accuracy, which means we can quantize images features to compact representation quickly even for large scale images.
  
    2. We proposed a pseudo label based unsupervised deep discriminative hashing for image retrieval. Hashing is a special case of quantization, and deep hashing codes has many advantages such as compact representation, efficient for retrieval, and more semantic representation power. However, most deep hashing models require much supervised information. To cure this problem, we propose a pseudo label based unsupervised deep discriminative hashing algorithm. Motivated by the transfer learning, we extract features for unlabeled images using pre-trained model and conduct clustering to get pseudo labels. We build the classification loss based on the pseudo labels, and we train the deep hashing network by the classification loss and quantization loss.

  
    3. We proposed an efficient deep feature extraction method via hashing. On the edge side (or mobile devices), the deep network model for extracting image features usually takes much storage and it's slow to extract the deep features because of limited computation resources on edge side. Binary weight networks can achieve network compression and acceleration compared to original deep networks, but binary quantization usually brings large accuracy drop. To cure this problem, we propose to train the binary weight network via hashing, which can extract deep features efficiently. To the best of our knowledge, it is the first to train binary weight CNNs via hashing. We uncovered the close connection between inner-product preserving hashing and binary weight neural networks, so training binary weight networks can be transformed into a hashing problem. To alleviate the loss brought by binary quantization, the binary codes is multiplied by a scaling factor. And we propose an alternating optimization method to iteratively update binary codes and scaling factor. The experimental results demonstrate that our proposed method outperforms the state-of-art algorithms.
 
    4. We proposed a semi-binary decomposition method to efficiently extract deep features. Although binary weight networks can extract deep features efficiently, they suffer from large accuracy drop  for their limited representation capacity. We proposed a semi-binary decomposition method which enables that binary weight networks have higher representation power. We also propose an alternating optimization method to learn the decomposition factors under the binary constraint. The experiments results on AlexNet, ResNet-18, and ResNet-50 demonstrate that our proposed method outperforms state-of-the-art algorithms by a large margin. In addition, we implement binary weight AlexNet on FPGA platform, and the experiment result shows that our binary weight networks can achieve $\sim9$ times speed-ups using less on-chip memory and hardware multipliers, which can extract deep features efficiently.

Pages120
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23896
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
胡庆浩. 基于量化学习的图像特征表示研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
Thesis.pdf(7513KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[胡庆浩]'s Articles
Baidu academic
Similar articles in Baidu academic
[胡庆浩]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[胡庆浩]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.