基于视皮层机制的立体视觉模型

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于视皮层机制的立体视觉模型
其他题名	Stereo Vision Modeling in Primate Visual Cortices
	明雁声
	2010-05-25
学位类型	工学硕士
中文摘要	深度，运动，颜色，纹理等特征是由人类大脑皮层的特定处理区域加工得到的基本特征，对这些基本特征的提取是任何一个视觉系统的基本要求。计算机视觉领域目前占统治地位的基于数学和工程的理论方法还无法可靠地提取这些基本特征，所以，研究基于生物视觉的计算模型和信息加工方法成为目前视觉信息处理的一个重要方向。本文的主要工作是吸收生物立体视觉方面的研究成果，建立有一定生理学可行性的立体视觉计算模型。该模型可以恢复场景中的深度信息，并且有助于进一步揭示生物立体视觉系统的工作机理。本文模型主要想法是：视皮层的某些神经元组成了一个神经网络，负责解决了立体视觉中的对应问题。本文认为该网络可以用马尔可夫随机场（MRF）来近似，并且神经元中的相互作用可以看作置信传播算法中的消息传递。和计算机视觉领域中基于MRF 的立体视觉算法相比，本文模型有两个不同。首先，本文模型的似然函数是从初级视皮层中复杂细胞的群响应信号得到的。复杂细胞被认为是视觉通道中最早的视差编码单元，其响应可以用视差能量模型来描述。本文进一步揭示了该似然函数与一些心理物理学实验有密切关系。其次，心理物理学的一些发现为本文模型的平滑函数的选择提供了一些约束依据。本文模型在三种立体图像上进行了实验。在带有重复纹理的图像上，实验显示本文模型能够解释一些被认为只能用“二阶机制”来解释的人类深度感知。在随机点立体图和自然图像上的实验表明，本文模型能够有效除去由复杂细胞的带通滤波特性产生的假目标。由粗到细算法是一个文献中流行的生物立体视觉算法。本文模型与由粗到细算法的比较实验显示，当前景物体比较小的时候，本文模型有着更大的相对视差检测范围。此外，本文进一步探讨了模型和一些生理学发现的关系。本文模型中假设的神经元是对绝对视差有选择性的神经元，并且有着易化型的整合野。在视皮层中有大量这种类型神经元存在。因此，我们猜测视皮层可能用了与本文模型相似的神经网络来实现立体视觉。
英文摘要	Depth, motion, color, texture are primary outputs of human visual cortices. Estimating these features are also the desired attributes of a general computer vision system. Many mathematical or empirical methods, although popular in the computer vision field, are not yet as robust as the human vision system. Therefore, it has become an important trend to build visual information processing machines based on knowledge from biological vision system. This thesis aims to build biological stereo vision model based on the extensive studies focusing on primate stereo vision system. The model should be able to extract depth from the two retinal images, as well as to provide new insights into neural system underling the primate stereo vision. Our model assumes that the correspondence problem, critical in stereopsis, is largely solved by a neural network simulating Markov random field (MRF). The neural dynamics of the neural network is assumed to implement the belief propagation algorithm. There are two differences between our proposed model and other stereo vision models based on MRF in computer vision field. First, the likelihood function in our model is constructed on the basis of the disparity energy model because complex cells are considered as front-end disparity encoders in the visual pathway. In addition, our likelihood function is also relevant to several psychological findings. The potential function in our model is constrained by the psychological finding that the strength of the cooperative interaction minimizing relative disparity decreases as the separation between stimuli increases. Our model is tested on three kinds of stereo images. In simulations on images with repetitive patterns, it is demonstrated that our model could account for the human iv depth percepts which were previously explained by the second-order mechanism. In simulations on random dot stereograms (RDS) and natural scene images, it is demonstrated that false matches introduced by the disparity energy model can be reliably removed by our model. A comparison with the coarse-to-fine model, a well-known model in the literature, shows that our model is able to compute the absolute disparity of small objects with larger relative disparity. Our model is also related to several physiological findings. The hypothesized neurons of the model are selective for absolute disparity and have facilitative extra receptive field. There are plenty of such neurons in the visual cortex. In conclusion, s...
关键词	立体视觉复杂细胞视差能量模型马尔科夫随机场 Stereopsis Complex Cells Disparity Energy Model Markov Random Field
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7524
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	明雁声. 基于视皮层机制的立体视觉模型[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20072801462801（1715KB）			限制开放	CC BY-NC-SA