CASIA OpenIR  > 毕业生  > 硕士学位论文
基于神经网络的图像语义识别算法研究
亓鲁
学位类型工学硕士
导师乔红
2017-05
学位授予单位中国科学院研究生院
学位授予地点北京
关键词语义识别 目标检测 语义分割 特征提取
摘要
在计算机视觉领域中,图像语义识别(Semantic Recognition)是一项重要的图像理解任务,它是指利用计算机对图像进行处理、分析和理解,其主要涵盖语义分类,语义检测及语义分割三个方面。因此,在一定程度上,图像语义识别已成为计算机视觉领域各项研究与应用的基础――三维重建、人脸识别、“看图说话”等研究都以其作为基础的理论支撑,而无人驾驶、无人机的广泛应用更是得益于该领域研究的成熟。
 
总体而言,面向图像语义识别的研究可大致分为两个阶段:前深度学习时代经典的图像识别算法和深度学习时代以神经网络为基础的识别算法。以深度学习为基础的识别算法大大克服了经典视觉算法中精度低、效果差的弊端,然而,训练时间长、非线性程度高以及理论解释薄弱也促使深度网络与经典识别算法相结合,相互促进以完成更为精确的识别任务。随着GPU硬件的发展以及大规模数据的标注,图像语义识别技术已经取得了巨大的成功。然而,现有算法仍然存在很多不足:样本标注花费了大量的人力物力、网络泛化迁移能力仍然较低以及网络功能单一。因此,本文针对上述基于深度学习的语义识别网络三点不足,对现有的方法进行了深入研究,分别从无监督学习、迁移学习以及实例分割的角度对现有方法做出改进。论文的主要工作包括:
 
1、提出了一种生物启发类脑视觉认知模型的语义提取方法。基于对大脑视皮层的结构、机理和功能的理解,在具有联想和记忆功能的卷积深度置信网络(Convolutional Deep Belief Network,CDBN)的基础上,增加了情境特征聚类、结构特征提取、特征再选择等模块,使该无监督学习框架对语义识别更具有鲁棒性。通过对卷积核和特征图可视化,验证和分析了CDBN模型的特征学习能力,特征聚类更是在简化网络结构的基础上保留判别型特征。网络通过对结构特征的提取,实现了模糊语义信息的精确识别。相比于其他的无监督的学习方法(HMAX模型或其他基于字典的学习方法),改进的CDBN模型具有更好更鲁棒的识别能力。
 
 
2、提出了一种基于抽象边缘信息的迁移语义识别方法。通过检测和边缘提取环节,我们使用一个统一的神经网络对不同类型数据集(真实照片和卡通图片)进行联合训练。其中,卡通图片颜色的多元化对神经网络的联合训练造成了很大的干扰,因此在训练过程中,网络摒弃了传统的颜色通道信息,而是采用更加泛化的边缘信息。相比于传统的神经网络,该网络利用鲁棒的边缘信息大大提高了语义网络的迁移能力,使网络能够有效识别具有不同视觉表征的物体。
 
3、提出了一种改进的实例语义分割网络。多任务级联网络(Multi-taskNetwork Cascade,MNC)将检测、分割及识别任务放在一个统一的网络中,该模型能够有效地检测出图像中的个体以及像素级的识别。然而此模型结构冗
杂:面罩分割以及分类两个分支级联,测试过程中多尺度选择也具有一定的不合理性。基于上述两点弊端,我们将MNC模型级联结构改为并行结构,并设计了一种自主选择测试尺度的方法。实验证明,改进的方法在不同类型的数据集上(COCO数据集以及PASCAL VOC数据集)识别精度均得到了提高,同时改进的模型也可应用到遥感领域。
 
基于深度网络,本文提出以及改进的模型和算法为图像语义识别提供了基础的框架和功能模块,对高性能、高集成的视觉认知模型和算法的设计和实现提供了新思路,在理论和应用中都具有重要的研究意义。
其他摘要
Semantic recognition, including classification, detection and segmentation,is one of the most important task in the area of computer vision. It intendsto manage, analyze and understand the content in the image by the help ofcomputer. To some degree, semantic recognition has become the backbone in the area of the computer vision. For example, the researches on 3D reconstruction, face recognition and image captioning are the specific circumstances of semantic recognition. In addition, the success of autonomous vehicle and unmanned aerial vehicle also benefit from the relevant researches.
 
In conclusion, the researches on image semantic recognition can be divided into two parts: classical algorithm before the deep learning and network-based algorithm in the age of deep learning. The network-based algorithm has achieved state-of-the-art in many tasks, which greatly overcome the low recognition accuracy of classical algorithm. However, as a result of lacking some theoretical explanation, costing too much training time and having high degree of nonlinearity, network-based model is combining the advantage of classical algorithm. Especially with the development of hardware like GPU and large scale image annotation, semantic recognition has gained some success. However, there are still some drawbacks: large-scale annotation costing a lot of human resources, network owning low transfer learning capacity and singe function. Therefore,
this thesis intend to solve these drawbacks mentioned above by modifying the existing model from the viewpoint of unsupervised learning, transfer learning and instance segmentation. The main work and contributions are as follows:
 
1, proposing an bio-inspired brain-like semantic extraction method. Inspired by visual cortex’s structure and function, we add the block of clustering episodic feature, extracting structure feature and re-selecting feature in the convolutional deep belief network. The modified model can be proved to have learning ability by visualizing the convolutional kernels and feature maps and extracting structure relationship among clustered features. Compared with other unsupervised methods, the modified CDBN model has higher recognition accuracy and more robustness.
 
2, proposing an contour-based semantic recognition method. we build an transfer classification network(TCN) trained by different manifestations such as genuine photos and cartoon abstracts with the help of detection and contour extraction. Color diversification in cartoon abstract can become noise when training the unified network. Therefore, TCN take advantage of the more generalized contour features instead of color channels. Compared with traditional neural network, our network has more generalized ability to recognize objects with different manifestations.
 
3, proposing an modified instance segmentation network. Multi-task Network Cascade(MNC) put the branch of detection, segmentation and recognition into the unified network, making the model effectively detect the object in pixel level. However, the structure of MNC is very complex-the cascaded branch of segmentation and recognition. The existing method of multi-scale test is not able to choose the appropriate scale for specific area’s size. Therefore, we modify the MNC model by converting the cascade multi-task structure into the parallel structure and proposing an new autonomous multi-scale test method. The experiment proved our modified model achieved considerable accuracy in public dataset(COCO and PASCAL VOC 2012). In addition, our modified model can be applied into remote dataset.
 
Based on deep network, The models and algorithms proposed in this thesis could provide basal framework and functional module for image semantic recognition, and new ideas for the design of visual cognition algorithms with high performance. The series work has important value for theory and application research.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/14804
专题毕业生_硕士学位论文
作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
亓鲁. 基于神经网络的图像语义识别算法研究[D]. 北京. 中国科学院研究生院,2017.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
亓鲁硕士毕业论文最终版.pdf(17956KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[亓鲁]的文章
百度学术
百度学术中相似的文章
[亓鲁]的文章
必应学术
必应学术中相似的文章
[亓鲁]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。