CASIA OpenIR  > 毕业生  > 硕士学位论文
基于多视图深度网络模型的视觉场景解析
关赫
Subtype工程硕士
Thesis Advisor谭铁牛 ; 张兆翔
2018-05-25
Degree Grantor中国科学院研究生院
Place of Conferral北京
Keyword多视图输入 自顶向下先验学习 对抗训练 语义分割 单目深度估计
Abstract场景解析是视觉分析问题中的一系列高级任务,也是人工智能体感知并与真实物理世界交互的基础,在自主导航、自动驾驶、增强现实等领域有广泛应用前景。然而现有的视觉模型结构相对固化,整合手段简单,推理时先验不足,不利于实现鲁棒的、高效的视觉场景解析。为此,本文提出了一种数据与认知双向驱动的理论框架,在语义分割和深度估计等核心问题上作出有意义的改进,实现了基于多视图深度网络增强的视觉场景解析。本文的主要创新点包括:
1. 提出一种基于自顶向下的结构化动态推理方法,有效地抽取语义先验知识,并复用了真值标注。网络设计的最近趋势之一证实了 Inception 分解卷积组是有效的,因为它可以在低维空间上聚集邻域间的强相关性而不会过多地损失表达能力。该方法基于 Inception 分解的空心卷积组实现了自底向上特征抽取和自顶向下先验指导的有机结合,以低成本、正交化等优势实现了分割性 能的再增强。
2. 提出一种基于视图分解的多通路对抗训练方法,对语义粗分割结果按类别离散化处理成子视图集。该方法基于端到端训练将粗精度分割图分解为类别独立的子视图集,并且逐通道进行对抗训练获取更有针对性的反馈梯度信息。
3. 提出一种基于自监督的循环对抗网络结构,应用于单目深度估计任务。该模型从循环结构设计、自监督单目深度估计以及图像块采样对抗三种方法联合优化。通过共享和级联基础网络,实现立体视差对的自监督循环估计;通过图像块采样策略降低样本对抗训练的难度,促使合成视图的局部细节更丰富。
最后我们总结了以上提出的方法中的关键问题和应对策略,以及未来亟待探索的研究方向。
Other AbstractScene analysis is a series of high-level tasks in visual analysis problem, and it is also the basis for artificial agents to perceive and interact with the real physical world. It has wide application prospects in many areas such as robotics navigation, autonomous driving, and augmented reality. However, those existing visual model structures are relatively solidified, the integration methods are slightly naive, and the priori is insufficient in inference phase, which is not conducive to a robust and efficient visual scene analysis framework. To this end, we propose a novel theoretical framework which combines data-driven and cognitive-driven for visual scene analysis, which makes a series of meaningful improvements in the core issues such as semantic segmentation task and depth estimation. The approach is applied to visual scene understanding based on multi-view deep enhancement network. The main contributions of this work include:
1. We propose a top-down structured dynamic inference method, which solves the problems of priori extraction and tagging reused. One of recent trends in network design confirms that the inception-type donut convolution group is efficient, since it can aggregate spatial context over lower dimensional without reducing representational power too much. The method successfully combines the benefits of bottom-up feature learning and top-down prior modeling by leveraging inception-decomposition donut convolution groups and then achieve the improvement of segmentation performance with two advantages of low-cost and orthogonalization.
2. We propose a multi-channel adversarial training method based on view decomposition skill, which promote rationalization of class-specific and class-agnostic semantic subviews. The adversarial network decomposes the coarse segment map into category-independent subviews and then performs per-channel adversarial training process to obtain more targeted feedback gradient information.
3. We propose a self-supervised based cyclical adversarial framework and apply it to monocular depth estimation task. The framework is jointly optimized in terms of cyclic architecture design, forward-warping reconstruction and image patch sampling strategy to achieve both high efficiency and high accuracy. By sharing and cascading the basic network, the self-supervised cyclic estimation is proposed for stereo disparity pairs. By image patch sampling scheme, we can reduce the difficulty of full-resolution sample adversarial training and promote more local details are embedded into synthetic views.
In the end, we summarized the key issues and coping strategies in those above proposed method and the research direction that needs to be explored in the future.
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/21598
Collection毕业生_硕士学位论文
Affiliation中国科学院自动化研究所
Recommended Citation
GB/T 7714
关赫. 基于多视图深度网络模型的视觉场景解析[D]. 北京. 中国科学院研究生院,2018.
Files in This Item:
File Name/Size DocType Version Access License
Master Paper.pdf(8873KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[关赫]'s Articles
Baidu academic
Similar articles in Baidu academic
[关赫]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[关赫]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.