Content based multimedia information retrieval (CBMIR) is one of the most important research directions in the field of multimedia analysis and processing. It aims to automatically retrieve the desired information from multimedia database based on the query's content. The traditional multimedia information retrieval techniques mainly rely on the textual keywords, which realize retrieval by using textual tagging information. However, due to semantic gap, textual keywords cannot bridge the semantic gap between the low-level feature and high-level semantics, which results in the ambiguity for the understanding and analysis of multimedia information. In addition, the textual keywords are not always available for large-scale multimedia database, because human annotation is a time and human resource consuming work. Therefore, content based multimedia information retrieval attracts more and more research interests and attentions. In this thesis, we analyze the existing problems in CBMIR, and mainly address some challenging issues: building a robust subspace model for image features, designing a bag-of-words based retrieval framework and realizing seamless fusion between audio and visual information; generating the coherency vocabulary indexing structure for multiple audio features fusion and efficient indexing; proposing a novel visual concept detection module to associate textual information and visual information; learning the object function of reranking model based on multiple information. The main contributions of this thesis include following issues: 1. We propose key-coding learning based topographic subspace model based on sparse coding, which can generate discriminative and sparse representation for image. The key-coding learning is an inductive transfer learning according to the label distribution, which is applied to solve the insufficiency of training data in machine learning with the help from large-scale auxiliary unlabeled data. Here, unlabeled data do not satisfy the independent and identical distribution with labeled data. Topographic subspace model which is built from unlabeled auxiliary data can describe the data distribution correctly. In topographic subspace, we generate a sparse feature vector for an image to make a good compromise between the efficiency and the effectiveness by key-coding on great number of local features. 2. In order to enhance the audio and video feature description and improve the retrieval efficiency, we propose a ...
修改评论