1. 提出一种基于自顶向下的结构化动态推理方法，有效地抽取语义先验知识，并复用了真值标注。网络设计的最近趋势之一证实了 Inception 分解卷积组是有效的，因为它可以在低维空间上聚集邻域间的强相关性而不会过多地损失表达能力。该方法基于 Inception 分解的空心卷积组实现了自底向上特征抽取和自顶向下先验指导的有机结合，以低成本、正交化等优势实现了分割性 能的再增强。
|Scene analysis is a series of high-level tasks in visual analysis problem, and it is also the basis for artificial agents to perceive and interact with the real physical world. It has wide application prospects in many areas such as robotics navigation, autonomous driving, and augmented reality. However, those existing visual model structures are relatively solidified, the integration methods are slightly naive, and the priori is insufficient in inference phase, which is not conducive to a robust and efficient visual scene analysis framework. To this end, we propose a novel theoretical framework which combines data-driven and cognitive-driven for visual scene analysis, which makes a series of meaningful improvements in the core issues such as semantic segmentation task and depth estimation. The approach is applied to visual scene understanding based on multi-view deep enhancement network. The main contributions of this work include:
1. We propose a top-down structured dynamic inference method, which solves the problems of priori extraction and tagging reused. One of recent trends in network design confirms that the inception-type donut convolution group is efficient, since it can aggregate spatial context over lower dimensional without reducing representational power too much. The method successfully combines the benefits of bottom-up feature learning and top-down prior modeling by leveraging inception-decomposition donut convolution groups and then achieve the improvement of segmentation performance with two advantages of low-cost and orthogonalization.
2. We propose a multi-channel adversarial training method based on view decomposition skill, which promote rationalization of class-specific and class-agnostic semantic subviews. The adversarial network decomposes the coarse segment map into category-independent subviews and then performs per-channel adversarial training process to obtain more targeted feedback gradient information.
3. We propose a self-supervised based cyclical adversarial framework and apply it to monocular depth estimation task. The framework is jointly optimized in terms of cyclic architecture design, forward-warping reconstruction and image patch sampling strategy to achieve both high efficiency and high accuracy. By sharing and cascading the basic network, the self-supervised cyclic estimation is proposed for stereo disparity pairs. By image patch sampling scheme, we can reduce the difficulty of full-resolution sample adversarial training and promote more local details are embedded into synthetic views.
In the end, we summarized the key issues and coping strategies in those above proposed method and the research direction that needs to be explored in the future.
|多视图输入 自顶向下先验学习 对抗训练 语义分割 单目深度估计
|关赫. 基于多视图深度网络模型的视觉场景解析[D]. 北京. 中国科学院研究生院,2018.
|Files in This Item:
|Recommend this item
|Export to Endnote
|Similar articles in Google Scholar
|Similar articles in Baidu academic
|Similar articles in Bing Scholar
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.