英文摘要 | The fast development of information technology has promoted the sharp rise of global data volume. In the era of big data, information retrieval methods face great challenges due to high computing and storage costs. Therefore, how to retrieve specific samples from massive data quickly, efficiently and accurately has attract extensive attentions in the field of machine learning. Technically, hash learning, as a common approximate nearest neighbor search method, has been widely concerned by researchers due to its high retrieval efficiency and low memory consumption.
Asymmetric hashing methods utilize different hash functions to encode query and database samples. Due to its flexible coding mechanism, excellent retrieval performance and efficient optimization strategy, some efforts have been devoted to delving into the merits of the asymmetric form in similarity search problem over the past years. This thesis studies the asymmetric hashing learning technique and attains some new methods and results. The main contributions and novelties of this thesis are listed as follows.
- A metric-embedded asymmetric hashing method (MEAH) is proposed. In this method, the asymmetric encoding strategy is utilized to reduce the quantization error of database points and preserve well the real-valued information of query points. Specifically, an unsupervised asymmetric hashing method is proposed to encode query and database points as real-valued codes and binary ones. Moreover, a bilinear function that measures the similarity between the asymmetric codes is embedded into the hashing learning framework. Extensive experiments demonstrate the superiority of our approach over the several traditional unsupervised hashing methods.
- An asymmetric multi-valued hashing method (AMVH) is proposed. The core idea of this method is to leverage multi-integer-embeddings and real-valued ones to represent database and query points, respectively. Specifically, we introduce binary sparse representation to model the multi-integer-valued embedding for database points. By this way, the problem with multi-integer constraints can be formulated as a mixed integer programming problem, which can be readily optimized by the proposed alternative optimization algorithm. Casting on the framework of supervised learning, our approach is able to learn non-binary embeddings with powerful capability of similarity preservation. Therefore, the retrieval accuracy can be improved while maintaining efficiency of query and storage. Extensive experiments verify the effectiveness of the proposed method.
- A nonlinear asymmetric multi-valued hashing method (NAMVH) is proposed, which leverages multi-layer neural network to encode the query point as real-valued embedding in AMVH. To this end, a well-designed alternative optimization algorithm is proposed to efficiently solve the highly coupled problem consists of neural network learning and mixed integer programming problem. Moreover, we present a simple method for incremental extension of database points. Theoretically, we demonstrate that the similarity metric of NAMVH can be regarded as the query sensitive weighted Hamming distance metric. Extensive experiments on seven datasets verify the effectiveness of training, query and storage in NAMVH.
- An unsupervised deep graph hashing method (UDGH) is proposed, where deep hash function learning and graph structure preservation are seamlessly formulated into a unified framework. Specifically, the neighborhood structure between network outputs and binary codes is locally preserved in an asymmetric way, yielding the asymmetric spectral loss function. Moreover, the gradients of minibatch samples can be readily computed with the global graph Laplacian, so that the whole structure information can be dramatically considered in every minibatch. By this way, the graph Laplacian can regulate the network training sufficiently and directly. Extensive experiments demonstrate that our approach surpasses the existing unsupervised hashing methods.
|
修改评论