CASIA OpenIR  > 毕业生  > 博士学位论文
面向异步时空脉冲数据的特征学习
孙琳晖
2023-12
Pages150
Subtype博士
Abstract

       神经形态传感器利用异步的稀疏脉冲数据流传递时空视觉信息,具有低延 迟、低能耗、高时间分辨率和高动态范围的优势。得益于这些优势,神经形态传 感器对高速运动和多变光照鲁棒,并且在受限的计算资源和响应时间下,该传感 器仍能够提供充分的视觉信息。因此,由神经形态传感器产生的脉冲数据被广泛 应用于姿态识别、动态物体识别、自动驾驶以及视觉场景重建等多种视觉任务 中。

       随着深度学习的发展,面向传统视觉数据的研究工作取得了显著的成果。在 此基础上,现有的面向脉冲数据的大部分研究工作将脉冲数据转化成传统的表 征形式并采用成熟的模型结构进行处理和分析。然而,传统相机以帧的形式同步 地产生具有空间信息的数据,神经形态传感器则是异步地产生包含时空视觉信 息的稀疏脉冲数据。因此,由于忽略了这两类数据的产生机理和包含信息的差 异,这些面向脉冲数据的处理方法通常会引入冗余的数据表征、低效的时空关系 建模以及复杂的网络结构,阻碍了脉冲数据在下游任务中发挥数据优势。因此, 基于脉冲数据的特点,展开面向异步时空脉冲数据特征学习的研究,高效地对稀 疏的异步时空脉冲数据包含的时空信息进行处理和分析,是一项具有前瞻性和 迫切需求的研究工作,拥有实际的应用价值和广泛的应用前景。 目前,面向脉冲数据的特征学习方法依然面临着许多的挑战,包括如何设计 低冗余、高信息量的脉冲数据表征形式,如何高效地建模脉冲数据间的时空关系 以及如何利用动态时空信息提高网络的特征提取效率。本文针对以上的挑战展 开了深入的研究,研究内容和创新点可以归纳为以下三个方面:

       1. 面向脉冲数据的高效时空关系建模:基于脉冲数据的稀疏性,现有的方法 一般采用图卷积神经网络建模脉冲数据的时空关系,这类方法主要存在两个问 题:一、空间图卷积在聚合邻居信息时会引入冗余计算;二、采用的推理算法在 面对异步输入的脉冲数据时需要重新计算网络的全部激活值,无法实现对异步 数据流的快速响应。为了解决上述两个问题,首先,本文提出了局部移位图卷积 网络,利用局部移位操作在通道维度聚合感受野内的邻居信息,并辅以基于节点 重要性的并行池化方法获得覆盖脉冲数据多样性的采样结果,达到高效建模脉 冲数据时空关系的目的。其次,基于脉冲数据的异步特性,本文设计了异步特征 更新策略,当新的脉冲数据到来时,基于图的连通关系,将需要被更新的网络激 活值限制在受新的数据影响的节点上,从而高效地建模新的时空关系。实验结果 证明,本文提出的局部移位操作显著降低了网络的计算复杂度,异步特征更新策 略实现了对异步脉冲数据流的快速响应。

       2. 基于动态时空信息的高效特征提取:现有的方法基于滑动窗口内的脉冲 数据提取时空特征,却忽略了相邻时空窗口之间的动态时空信息和潜在的运动 重叠,导致了信息损失和计算冗余。为了解决这一问题,本文首先提出了一个对 偶结构网络并辅以高效推理策略,在捕捉相邻窗口间动态时空信息的同时提高 特征提取效率。该对偶结构网络包含基础分支和增量分支。在推理过程中,该网 络将多个连续的滑动窗口视为一个处理单元,基础分支负责建模第一个滑动窗 口内脉冲数据的时空关系,后续的滑动窗口由轻量级的增量分支利用差分方法 提取动态时空信息并得到识别结果。其次,本文提出了一个轻量的记忆库,刻画 了数据集所包含的不同运动语义对应的时空关系,并通过注意力机制对基础分 支和增量分支执行自适应的特征增强。实验结果表明,本文提出的网络结构和高 效推理策略,极大地提升了网络的特征提取效率以及识别精度。

       3. 高效的脉冲数据表征与自适应特征学习:脉冲数据的表征是特征学习的 基础。考虑到脉冲数据体量较大且数目多变,现有的方法通常对原始数据进行采 样作为输入,但是会导致信息损失。一些工作提出基于体素的表征形式,在降低 基本处理单元数目的同时提升了输入的表达能力。然而,这些方法通过简单的求 和方式获得体素属性,并基于密度进行体素选择,可能会忽略脉冲数据时间维度 和空间维度上的一些代表性信息。因此,为了在保留脉冲数据所包含的时空信息 的同时保持数据的稀疏特性,本文提出了基于时空解耦的体素选择策略。该策略 首先将脉冲数据按照时间维度划分成时间切片,并定位到含有显著运动的时间 切片;之后,将每一个时间切片转化为体素形式,基于体素内包含的运动强度和 物体边缘信息,选取能够覆盖切片内运动轮廓的代表性体素;最后,所有切片提 取的体素会重新构成稀疏形式,获得冗余低、信息量高的数据表征。此外,由于 不同的输入节点具有多变的时空关系,本文引入自适应的多尺度移位策略,为每 个数据提取具有自适应感受野的多尺度特征。实验表明,本文提出的数据表征形 式和网络结构实现了最高的识别精度。

       本文的三项研究内容,分别从脉冲数据的高效时空关系建模、窗口间动态时 空信息捕捉以及脉冲数据的紧致表征展开研究工作,全面系统地构建了完整的 脉冲数据的特征学习方案,提升了特征提取效率以及特征表达能力。

 

Other Abstract

         Event cameras are bio-inspired sensors that utilize asynchronous and sparse event streams to convey spatiotemporal visual information. Compared with traditional cameras, event cameras exhibit four attractive properties, including low latency, low power, high temporal resolution, and high dynamic range. Benefiting from these properties, event cameras are robust to high-speed motion and variable illumination, and can provide sufficient visual information under limited computational resources and response time. Therefore, event cameras are widely utilized in many visual tasks, such as pose recognition, dynamic object recognition, autonomous driving, visual scene reconstruction, and so on.   

         With the development of deep learning, significant progress has been made in research on traditional visual data. On this basis, most of the existing method transforms event-based data into traditional representation forms and adopts mature model structures for data processing. However, traditional cameras synchronously generate dense frames with spatial information, while event cameras asynchronously generate sparse event streams containing spatiotemporal visual information. Therefore, due to ignoring the differences in the generation mechanism and information contained by the two types of data, these methods usually introduce redundant data representation, inefficient spatiotemporal relationships modeling, and complex network structure, which hinder the full utilization of the data advantages in downstream tasks. Therefore, based on the characteristics of event-based data, conducting research on feature learning for asynchronous spatiotemporal event-based data to efficiently analyze the spatiotemporal information contained in the data is forward-looking and urgent, which has practical application value and wide application prospect.

        At present, the feature learning methods for event-based data still face many challenges, including how to design the representation form for event-based data with low redundancy and high information, how to effectively model the spatiotemporal relationships of event-based data, and how to improve the efficiency of feature extraction by using dynamic spatiotemporal information. This thesis has carried on the in-depth research on the above challenges, and the contributions can be summarized into three aspects:

        1. Efficient spatiotemporal relationships modeling for event-based data: Due to the sparsity of event-based data, existing methods utilize the graph convolution network to model the spatiotemporal relationships of input data. However, these methods have two main issues. Firstly, spatial graph convolution introduces redundant computation when aggregating neighbor features. Secondly, the introduced inference algorithm needs to recompute all network activations when facing newly triggered data, preventing rapid responsiveness to asynchronous event streams. To solve the above two problems, firstly, this thesis proposes a local-shift graph convolutional network equipped with a node-importance based parallel pooling method. The proposed network utilizes local shift operation to aggregate neighbor information along the channel dimension and utilizes the pooling method to obtain representative nodes that cover the diversity of event-based data. In this way, the proposed network can efficiently model the spatiotemporal relationships of input data. Secondly, based on the asynchronous characteristics of event-based data, this thesis designs an asynchronous feature processing procedure. When new data arrives, based on the connectivity between nodes, the proposed strategy restricts the network nodes that need to recompute activations only to those affected by the new arrival event. In this way, the new spatiotemporal relationships can be modeled efficiently. Experimental results demonstrate that the proposed network significantly reduces the computational complexity, and the asynchronous feature processing procedure enables rapid responsiveness to asynchronous event streams.

        2. Efficient feature extraction based on dynamic spatiotemporal information: Existing methods extract spatiotemporal features from event-based data within sliding windows. However, these methods ignore the dynamic spatiotemporal information between adjacent spatiotemporal windows and the potential motion overlap, resulting in information loss and computational redundancy. To tackle this problem, this thesis proposes a dual-branch network equipped with an efficient inference strategy, which captures dynamic spatiotemporal information between adjacent windows while improving the feature extraction efficiency. The proposed dual-branch network contains a base branch and an incremental branch. In inference, multiple consecutive sliding windows are treated as a processing unit. The base branch is responsible for modeling the spatiotemporal relationships of event-based data within the first sliding window. Subsequent sliding windows are processed by a lightweight incremental branch, which utilizes differential methods to extract dynamic spatiotemporal information and obtain recognition results. In addition, this thesis proposes a lightweight point-wise memory bank that sketches the spatiotemporal relationships corresponding to different motion semantics within the dataset. The proposed memory bank utilizes an attention mechanism to perform adaptive feature enhancement for both the base branch and the incremental branch. Experimental results demonstrate that the proposed network and efficient inference strategy significantly enhance feature extraction efficiency and recognition accuracy.

        3. Efficient representation of event-based data and adaptive feature learning: The representation of event-based data is the foundation of feature learning. Considering the large and variable number of event-based data, current methods usually sample the raw data as input, which may result in information loss. Some methods propose voxel-based representation, which reduces the number of fundamental processing units while enhancing the representational capacity of input. However, these methods obtain voxel attributes through a simple summation process and select voxels based on density, which may ignore certain representative information of event-based data in the temporal and spatial dimensions. Therefore, to preserve the spatiotemporal information contained in event-based data while maintaining its sparse characteristics, this thesis proposes a voxel selection strategy based on spatiotemporal decoupling. Firstly, the proposed strategy divides the event stream into multiple time slices along the temporal dimension and identifies time slices containing significant motion. Secondly, each time slice is transformed into a voxel form. Based on the motion intensity and object edge information contained in the voxels, representative voxels that cover the motion contours within the slice are selected. Finally, all voxels extracted from the identified slices are reassembled into a sparse representation with low redundancy and high information. Furthermore, since different input nodes exhibit variable spatiotemporal relationships, an adaptive multi-scale shift strategy is introduced, which extracts the multi-scale features with an adaptive receptive field for each data. Experimental results demonstrate that the proposed data representation and network structure achieve the highest recognition accuracy. This thesis investigates three research aspects, including efficient spatiotemporal modeling of event-based data, dynamic spatiotemporal information capture between windows, and compact representation of event-based data. Synthesizing the three methods constructs a comprehensive and systematic feature learning framework for event-based data, significantly improving both the efficiency of feature extraction and the expressive capacity of feature representation.

Keyword脉冲数据 特征学习 时空关系建模 图卷积神经网络 多尺度特征
Language中文
Sub direction classification图像视频处理与分析
planning direction of the national heavy laboratory视觉信息处理
Paper associated data
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/54536
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
孙琳晖. 面向异步时空脉冲数据的特征学习[D],2023.
Files in This Item:
File Name/Size DocType Version Access License
孙琳晖_面向异步时空脉冲数据的特征学习.(23773KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[孙琳晖]'s Articles
Baidu academic
Similar articles in Baidu academic
[孙琳晖]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[孙琳晖]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.