生物启发式视觉识别模型与算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	生物启发式视觉识别模型与算法研究
	李寅霖
	2016-05
学位类型	工学博士
中文摘要	视觉主导着人类的知觉系统，是一切行动的重要基础。长期以来，研究者试图赋予人工系统类人的视觉处理能力。视觉识别，特别是图片分类和物体检测一直是研究的重点，是更高级别视觉感知和认知的基础，在多个领域具有不可替代的重要作用。在过去几十年里，视觉识别算法取得了一系列标志性成果。但与人类视觉认知能力相比，视觉识别算法在稳定性、泛化性等方面还存在较大差距。而在认知神经科学等领域，随着新的实验和分析技术的进步，研究者在宏观和微观层面对生物视觉系统提出了新的发现和解释。因此，对视皮层的结构、机理和功能进行模拟，设计生物启发式视觉模型和算法，将为现有视觉算法中存在的问题提供可能的解决方案，为视觉任务建模提供新方法和新思路。同时，为生物学实验数据和结论提供可计算验证平台，并启发新实验设计。近几年来，生物启发式视觉模型与算法研究成为生物与信息交叉学科的一个重要研究方向。以此为出发点，本文针对生物启发式视觉识别模型和算法，从模型框架设计、学习算法设计和模型解释三个方面开展研究，主要工作和贡献如下：（1）将联想和记忆初步引入到腹侧视觉通路多层认知模型（Hierarchical Maxpooling Model, HMAX）中，建立了一个视觉认知的基础框架。其中，物体的记忆包括语义记忆和情景记忆，显著的语义特征可调节情景特征部件的记忆。同一类特征的学习和记忆发生在相同脑区，分布式的特征表达可服务于同类特征的快速联想。通过识别记忆的两个阶段，即相似判别和回忆匹配来实现识别，也可通过多个特征部件语义特征和情景特征的集群编码实现识别。相比于 HMAX 模型，新模型对物体识别任务可输出语义描述，具有更高的识别准确率，并对相关机制的建模提供了一个基础框架。（2）针对视觉认知前 100-150 毫秒的前馈过程，将注意力调节、记忆加工和位置编码引入 HMAX 模型。在新模型中，模拟初级视皮层（V1）自下而上的注意力机制，基于多种特征的对比差异形成显著图，为特征选择提供初始候选区域。模拟颞叶皮质（IT）分布式的特征学习和聚集机制，对初始采样的多尺度中间级特征模板迭代聚类，学习具有判别性和表示能力的特征模板，实现相同/相似特征模板的聚类和共享，充实工作（1）中分布式的特征学习和记忆方式。最终，对特征和位置组合编码，实现多类物体分类任务。相比于 HMAX 模型和其它基于字典学习的方法，改进模型具有更高的分类准确率。（3）改进卷积深度置信网络（Convolutional Deep Belief Network, CDBN），对其特征学习过程进行可视化分析，设计更简洁的结构实现更准确的判别。通过对卷积核和特征图的可视化，验证和分析 CDBN 模型的特征学习能力，及其与 HMAX 模型和视皮层在功能上的对应关系。对高层特征图的响应，通过同一样本最大整合和不同样本平均整合的方式，实现关键部件空间位置学习，为特征部件更加精细的特征和语义学习提供可能的位置范围。进一步，对重构的卷积核进行聚类分析，选择代表性的卷积核对模型进行简化。与 HMAX 模型和其它基于字典学习的模型比较，改进模型具有更高的分类准确率。最后，提出了使用“记忆权重”或“记忆样本”的方式实现在线增量学习，使网络具有对新旧类别样本的适应能力。本工作对了解多层网络的特征学习过程，有效利用中间层输出，简化网络结构并实现网络自适应具有重要意义。（4）将卷积神经网络（Convolutional Neural Network, CNN），主要包括快速的基于区域的 CNN 模型及其它相关算法应用到头戴式摄像机采集的多类抓取手势图片识别中。与基于多种人工设计特征和多阶段处理的算法不同，本方法可以自动学习判别性特征，在复杂背景下，同时实现多种抓取手势的分类和定位。进一步，使用一种多层聚类方法，建立多类抓取手势的树形结构，并分析各个抓取手势之间的相关性。本工作中快速准确的抓取手势分类、检测和基于多层聚类的抓取分析为安全自然的人机交互、机器人操作的自学习、灵巧智能手的设计与控制提供了重要支持。本文提出的模型和算法为相关视皮层结构和机制的精细计算建模提供了基础框架和功能模块，对高性能视觉认知模型和算法的设计和实现提供了新思路，在理论和应用中都具有重要的研究意义。
英文摘要	Vision dominates the perceptual system of human, and it is an important base of alltheactionsofhuman. Foralongtime,researcherstrytoendowartiﬁcialsystemwith visual processing ability. Visual recognition, especially image classiﬁcation and object detection are the most important research directions. They are the base of higher level visual perception and cognition, and have an irreplaceable importance for many research ﬁelds. In the past several decades, visual recognition algorithms have made a seriesofachievements. However, therearestillabiggapbetweenvisualalgorithmsand visual cognition ability of human, especially in stability, generalization, etc. While in cognitivescience,alongwiththeprogressofnewexperimentalandanalyticaltechniques, researchershaveproposed new discoveriesand interprets of the biological visual system both in macroscopic and microcosmic level. Therefore, biologically inspired visual algorithms mimicking the structure, mechanism and function of visual cortex, may provide solutions to the problems in existing visual algorithms, and new methods and ideas for the modeling of visual tasks. Meanwhile, they may also provide computational veriﬁcation platforms for biological experimental data and results, and inspire the design of new experiments. In recent years, biologically inspired visual algorithm has become an important direction in the interdisciplinary research between neuroscience and information science. As a starting point, this thesis studies biologically inspired visual models and algorithms, which focuses on framework design, learning algorithm design and model interpretation. The main work and contributions are as follows: (1) Introducing memory and association into the computational Hierarchical MaxpoolingModel(HMAX)oftheventralvisualpathwaypreliminarily,andproposing abasalframeworkofvisualcognition. Intheframework,thememoryofanobject includes semantic memory and episodic memory, and the prominent semantic featurescanadjustthememoryofepisodicfeatures. Learningandmemoryofone same kind of features are in the same region of visual cortex, and the distributed feature representations can support fast learning and association. Recognition can be achieved through two stages of recognition memory, which are familiarity discrimination and re-collective matching, or through neuronal ensemble coding of semantic and episodic features of key components of objects. Compared with HMAX model, the new model can output semantic descriptions, achieve higher recognition accuracy, and provide a basal framework for the modeling of related visual mechanisms. (2) Introducingattentionmodulation,memoryprocessingandmulti-featureencoding intoHMAXmodel,whichmainlycorrespondstothefeedforwardprocessinginthe ﬁrst 100-150 milliseconds of visual cognition. In the new model, ﬁrst, a saliency mapisgeneratedbycombiningthecontrastcomputationofmulti-featuretomimic the bottom up attention modulation of primary visual cortex (V1), which also provides the initial regions for feature selection. Second, the initially sampled multi-scale middle level patches are clustered in an iterative clustering manner to learn the patches with discriminative and representative ability. The same or similar patches are clustered together, which can achieve memory sharing, and enrichthedistributedfeaturestructurelearningandmemoryproposedintheﬁrst work. Finally,thefeaturesareencodedwithpositioninformation,andmulti-class categorization is achieved. Compared with HMAX model and other dictionary learning based methods, the new model can achieve higher accuracy. (3) Modifying Convolutional Deep Belief Network (CDBN), which has a simpliﬁed structure but higher accuracy, and its feature learning process is also interpreted by visualization. First, by eﬀective visualization of the convolutional kernels and feature maps, the learning ability of the CDBN model and its correspondence to theHMAXmodelandthevisualcortexareanalyzedandveriﬁed. Second, forthe activations of high level feature maps, by maximum integration from one sample and average integration from diﬀerent samples, the model can achieve position learning of key components, which could provide the possible regions for more elaborate and semantic feature learning. Then, the convolutional kernels are reconstructedandclustered, therepresentativeconvolutionalkernelsareselectedto simplify the model. Compared with HMAX model and other dictionary learning based methods, the modiﬁed model could achieve higher classiﬁcation accuracy. Finally, we propose two online incremental leaning methods, “memorizing weights”and“memorizingsamples”,tohelpthenetworkachieveadaptionboth fornewandoldsamples. Thisworkismeaningfultotheunderstandingoffeature learning process, utilizing of the middle level outputs, simplifying and enhancing adaptation ability of multi-layer neural network. (4) The Convolutional Neural Networks (CNN), mainly including the Fast Region base CNN model (Fast R-CNN) and other related algorithms, are applied to the analysis of multi-class grasp type images collected by head mounted camera. Compared with traditional methods based on multiple hand-craft features and multi-stage processing, the method we used can learn discriminative features automatically, and achieve grasp type classiﬁcation and localization in one single pipeline for images with complex background. Furthermore, a hierarchical clustering method is used to build the tree structure of multi-class grasp types, and the relations of the grasp types are discussed. The accurate and fast grasp type classiﬁcationanddetection,andtheanalysisofhierarchalclusteringofgrasptypes provide important supports for the safe and natural interaction between machine and human, the self-taught of robot manipulation, and the design and control of dexterous grasper. The models and algorithms proposed in this thesis could provide basal framework and functional module for the more elaborate modeling of related mechanisms and structures of visual cortex, and new ideas for the design of visual cognition algorithms withhighperformance. Theseriesworkhasimportantvaluefortheoryandapplication research.
关键词	生物启发生物机制 Hmax 模型多层结构分类检测机器人
学科领域	模式识别与智能系统
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/11697
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李寅霖. 生物启发式视觉识别模型与算法研究[D]. 北京. 中国科学院研究生院,2016.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
201318014628052李寅霖.p（10822KB）	学位论文		限制开放	CC BY-NC-SA