CASIA OpenIR  > 毕业生  > 博士学位论文
跨模态关联学习及其在图像检索中的应用研究
何泳澔
学位类型工学博士
导师向世明
2016-05-27
学位授予单位中国科学院大学
学位授予地点北京
关键词信息检索 自动图像标注 图像标签排序 跨模态检索 深度学习 前馈神经网络 卷积神经网络
其他摘要随着互联网的蓬勃发展,数据呈现出爆发式增长,因此信息检索作为一个 重要的数据处理技术,受到工业界和学术界的长期关注,成为一个热点研究问 题。信息检索框架包含两个关键环节:数据的结构化和候选项的排序。数据结构 化需要解决的核心问题是如何从原始数据中提炼和组织重要的信息,即元数据metadata);候选项排序需要解决的核心问题是如何根据用户查询的关联性对候 选结果进行排序。当前,信息检索面临的主要挑战有以下两个方面:(1)数据量 的高速增长迫切需要高效且准确的数据结构化方法;(2)多样的数据形式和丰富 的数据内容使得深入挖掘数据内在联系变得越发困难。针对上述信息检索中的关 键环节及挑战,本文从跨模态关联学习的角度出发,对自动图像标注、图像标签 排序和图像文本跨模态检索等问题开展了一系列相关研究工作。论文的主要贡献 如下:
提出了一种基于图像标签关联学习的自动图像标注方法。其核心思想是: 通过使用图像标签关联矩阵和图像间相似度对标签信息进行线性传播。方 法提出了标签偏置正则约束,其能够确保学习到更有意义的图像标签关联 矩阵。所提方法具有两个优势:利用多种图像特征而无需降维和快速的模型 求解。在三个公开数据库上的对比实验验证了所提方法的优越性。
提出了一种基于深度特征学习和标签嵌入学习的自动图像标注方法。该方法 使用视觉特征向量和标签嵌入向量作为输入,随后利用深度前馈神经网络进 行特征学习,最后使用度量矩阵计算图像和标签的关联度。该方法能够处理 大规模标注问题,同时可以自然地实现在线学习,而无需改动方法中所使用 的网络结构。在大规模数据库上的实验表明,所提方法拥有较快的标注速度 和优异的标注性能。
提出了一种基于配对标签信息的半监督图像标签排序方法。具体地,该方法 首先将排序的图像标签列表分解为标签相对关系矩阵,该矩阵可以等价地表 示标签列表的内在排序结构,由此可以避免对复杂排序列表的直接建模。然 后,该方法结合图像间的相似度和图像标签关联矩阵提出了线性的标签关 联度预测函数。最后,利用标签已排序和未排序图像数据来构建半监督的标 签排序模型,并从中学习图像标签关联矩阵。所构建的学习模型可通过解
析方式直接求解。对比实验表明,所提方法能够取得优于现有方法的标签排 序结果。
提出了一种基于深度双向特征学习的图像文本跨模态检索方法。其核心思 想是通过建构针对特定模态的卷积神经网络来实现跨模态特征学习。具体 地,在深度神经网络模型构建方面,采用两个卷积网络来分别学习图像特征 和文本特征,并通过跨模态相似性度量将卷积网络的输出进行关联,从而挖 掘跨模态样本对的匹配和非匹配信息。所设计的深层网络结构可适应跨模 态双向检索的特点,即同时表达图像检索文本和文本检索图像两个任务。最 后,所提方法引入最大似然框架来优化网络参数。大量的对比实验表明,该 方法能够为图像和文本提取出具有语义的特征,进而在图像文本跨模态检 索任务上具有优异的性能。


; Due to the prosperity of the Internet, the data has shown an explosive growth.
As an important technique for data processing, information retrieval has drawn a
lot of attention from both industry and academia for a long time, being an active
research topic. There are two key factors in the framework of information retrieval:
data structuralization and candidate ranking. The goal of data structuralization is
to extract and organize the important information hidden in the raw data, forming
the metadata. Candidate ranking aims to further rank the candidate results according to the correlation of user queries. Currently, the main challenges of information
retrieval lie in two aspects: (1) the rapid growth of the data induces the need of
effective and precise methods of data structuralization; (2) the diversity of the data
modality and the richness of the data content make it difficult to exploit the inner
relations among the data. Considering the key factors and the challenges in information retrieval, this dissertation presents a series of research works on automatic
image annotation, image tag ranking and image-text cross-modal retrieval from the
perspective of cross-modal correlation learning. Specifically, the main contributions
of this dissertation are listed as follows:
An image-tag correlation learning based method is proposed to address the
to estimate the correlation between images and tags. This method can handle
the large scale image annotation and has the ability of online learning without
changing the network architecture. Experiments on a large dataset exhibit
the fast annotation process and the outstanding performance of the proposed
approach.
A pairwise supervision based semi-supervised method is proposed for addressing the issue of image tag ranking. Specifically, this method decomposes the
ranked tag list to build up the tag relative relation matrix. The tag relative relation matrix can equally express the inner ranking structure of the tag
list, which avoids directly modeling the complex ranked list. Afterwards, this
method constructs a linear relevance prediction function based on the image
similarities and the image-tag correlation matrix. Finally, by combining the
supervised and unsupervised data, the proposed method results in a semisupervised model, through which the image-tag correlation matrix is learned.
This model can be elegantly solved in closed-form. Comparative experiments
show that the proposed method can achieve superior ranking performance than
other methods.
A deep and bidirectional representation learning based method is proposed
to address the issue of image-text cross-modal retrieval. The main idea of
this method is to achieve cross-modal feature learning via modality-specific
convolutional neural networks. Specifically, two convolution based neural networks are employed to accomplish the feature learning for images and texts,
respectively. Subsequently, the outputs of the neural networks are correlated
through the cross-modal similarity measurement for mining the information of
the matched and unmatched image-text pairs. The proposed neural network
architecture takes the characteristics of cross-modal retrieval into consideration, namely searching texts by image queries and searching images by text
queries. Finally, this method introduces the maximum likelihood framework
for model optimization. A large number of experiments demonstrate that the
proposed method can extract semantic features for images and texts, thus
achieving outstanding performance on cross-modal retrieval.

issue of automatic image annotation. The main idea is to linearly propagate the
label information through the image similarities and the image-tag correlation
matrix. Meanwhile, this method constructs a tag-biased regularization, which
is to guarantee a more meaningful image-tag correlation matrix. This method
has two advantages: utilizing multiple image features without dimensionality
reduction and enabling the closed-form solution for the model. Experiments
on three public datasets demonstrate the superiority of the proposed method.
A deep feature learning and tag embedding learning based method is proposed
for automatic image annotation. This method takes the visual features and
tag embeddings as the inputs, and then uses deep feed-forward networks to
achieve the goal of feature learning. Finally, a metric matrix is constructed

文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/11653
专题毕业生_博士学位论文
作者单位中国科学院自动化研究所,模式识别国家重点实验室
推荐引用方式
GB/T 7714
何泳澔. 跨模态关联学习及其在图像检索中的应用研究[D]. 北京. 中国科学院大学,2016.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
提交版--毕业论文--何泳澔.pdf(6341KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[何泳澔]的文章
百度学术
百度学术中相似的文章
[何泳澔]的文章
必应学术
必应学术中相似的文章
[何泳澔]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。