Recent years have witnessed the rapid development of artificial intelligence, people have made big breakthroughs on many research areas of computer vision. While image is one of the most important information carriers, how to represent the image features has been a hot research topic for a long time. As the key problem for image analysis and understanding, image feature representation plays an important role in content-based image retrieval, image classification, face identification and so on.
Although studies on image feature representation have progressed rapidly in recent years, there still exists some problems both on server side and edge side (or mobile device). For server side, since the quantity of images on the internet grows rapidly in recent years, saving large scale image features will take much storage. So how to learn compact image feature representation is a good problem to study. Besides, the specific format to represent the image will directly affect the retrieval speed. For floating format image features, calculating the similarities for large scale image set requres more time. But it will be accelerated if the images are represented by fixed-point format numbers. For edge side, deep learning based feature representation has powerful discrimination ability. More over, extracting deep features on edge devices will protect users' privacy and alleviate the request pressure for server side. Most edge devices or mobile devices have limited computation power, disk storage, and battery power. However, most deep neural networks have much high computation complexity and storage demand. As a result, it's a challenge to extract image features efficiently on edge side.
Learning-to-quantize is to map values from a large set (e.g. a continuous set) to values in a smaller set(a discrete set) via learning from data. Traditional quantization methods, including rounding and truncation, get the quantization values directly while learning-to-quantize will learn the quantization value from the data guided by some loss functions. Learning-to-quantize can reduce the storage of image feature and accelerate the retrieval speed. Additionally, it can also be used to accelerate deep features extraction. So studying the learning-to-quantize based image feature representation has very important application values.
To cure those problems mentioned above, we conduct a series of research on image feature hashing, quantized deep network based feature representation. We summarize the specific research content and contributions as follows:
1. We proposed a fast k-means based compact image feature representation method for large scale images. Clustering image features via K-means can achieve compact image feature representation. But K-means becomes quite slow for large scale clustering. To speed up K-means algorithm and represent images via compact features, we propose a fast k-means based image feature representation method, which uses a coarse-to-fine search strategy to reduce the search space. Besides, hashing algorithm is introduced to speed up the assignment step in order to reduce the candidate clusters. Experiments show that the proposed MKM algorithm can achieve up to 600 tmes speed-up over the standard $k$-means with comparable clustering accuracy, which means we can quantize images features to compact representation quickly even for large scale images.
2. We proposed a pseudo label based unsupervised deep discriminative hashing for image retrieval. Hashing is a special case of quantization, and deep hashing codes has many advantages such as compact representation, efficient for retrieval, and more semantic representation power. However, most deep hashing models require much supervised information. To cure this problem, we propose a pseudo label based unsupervised deep discriminative hashing algorithm. Motivated by the transfer learning, we extract features for unlabeled images using pre-trained model and conduct clustering to get pseudo labels. We build the classification loss based on the pseudo labels, and we train the deep hashing network by the classification loss and quantization loss.
3. We proposed an efficient deep feature extraction method via hashing. On the edge side (or mobile devices), the deep network model for extracting image features usually takes much storage and it's slow to extract the deep features because of limited computation resources on edge side. Binary weight networks can achieve network compression and acceleration compared to original deep networks, but binary quantization usually brings large accuracy drop. To cure this problem, we propose to train the binary weight network via hashing, which can extract deep features efficiently. To the best of our knowledge, it is the first to train binary weight CNNs via hashing. We uncovered the close connection between inner-product preserving hashing and binary weight neural networks, so training binary weight networks can be transformed into a hashing problem. To alleviate the loss brought by binary quantization, the binary codes is multiplied by a scaling factor. And we propose an alternating optimization method to iteratively update binary codes and scaling factor. The experimental results demonstrate that our proposed method outperforms the state-of-art algorithms.
4. We proposed a semi-binary decomposition method to efficiently extract deep features. Although binary weight networks can extract deep features efficiently, they suffer from large accuracy drop for their limited representation capacity. We proposed a semi-binary decomposition method which enables that binary weight networks have higher representation power. We also propose an alternating optimization method to learn the decomposition factors under the binary constraint. The experiments results on AlexNet, ResNet-18, and ResNet-50 demonstrate that our proposed method outperforms state-of-the-art algorithms by a large margin. In addition, we implement binary weight AlexNet on FPGA platform, and the experiment result shows that our binary weight networks can achieve $\sim9$ times speed-ups using less on-chip memory and hardware multipliers, which can extract deep features efficiently.