CASIA OpenIR  > 多模态人工智能系统全国重点实验室
Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification
Guo LY(郭凌月)1,2; Zeyu Gao1,2; Jinye Qu1,2; Suiwu Zheng1,2; Runhao Jiang3; Yanfeng Lu1,2; Hong Qiao1,2
Source PublicationIEEE Transactions on Cognitive and Developmental Systems
2023
PagesDOI 10.1109/TCDS.2023.3327081
Abstract

The spiking neural networks (SNNs), as brain- inspired neural networks, have received noteworthy attention due to their advantages of low power consumption, high parallelism, and high fault tolerance. While SNNs have shown promising results in uni-modal data tasks, their deployment in multi-modal audiovisual classification remains limited, and the effectiveness of capturing correlations between visual and audio modalities in SNNs needs improvement. To address these challenges, we propose a novel model called Spiking Multi-Model Transformer (SMMT) that combines SNNs and Transformers for multi-modal audiovisual classification. The SMMT model integrates uni- modal sub-networks for visual and auditory modalities with a novel Spiking Cross-Attention module for fusion, enhancing the correlation between visual and audio modalities. This approach leads to competitive accuracy in multi-modal classification tasks with low energy consumption, making it an effective and energy- efficient solution. Extensive experiments on a public event-based dataset(N-TIDIGIT&MNIST-DVS) and two self-made audiovisual datasets of real-world objects(CIFAR10-AV and UrbanSound8K- AV) demonstrate the effectiveness and energy efficiency of the proposed SMMT model in multi-modal audio-visual classification tasks. Our constructed multi-modal audiovisual datasets can be accessed at https://github.com/Guo-Lingyue/SMMT.

Indexed BySCI
Sub direction classification类脑模型与计算
planning direction of the national heavy laboratory脑启发多模态智能模型与算法
Paper associated data
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/56541
Collection多模态人工智能系统全国重点实验室
Corresponding AuthorYanfeng Lu
Affiliation1.the State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Science (CASIA)
2.the University of Chinese Academy of Sciences (UCAS)
3.College of Computer Science and Technology, Zhejiang University
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Corresponding Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Guo LY,Zeyu Gao,Jinye Qu,et al. Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification[J]. IEEE Transactions on Cognitive and Developmental Systems,2023:DOI 10.1109/TCDS.2023.3327081.
APA Guo LY.,Zeyu Gao.,Jinye Qu.,Suiwu Zheng.,Runhao Jiang.,...&Hong Qiao.(2023).Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification.IEEE Transactions on Cognitive and Developmental Systems,DOI 10.1109/TCDS.2023.3327081.
MLA Guo LY,et al."Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification".IEEE Transactions on Cognitive and Developmental Systems (2023):DOI 10.1109/TCDS.2023.3327081.
Files in This Item:
File Name/Size DocType Version Access License
Transformer-based_Sp(3922KB)期刊论文作者接受稿开放获取CC BY-NC-SAView
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Guo LY(郭凌月)]'s Articles
[Zeyu Gao]'s Articles
[Jinye Qu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Guo LY(郭凌月)]'s Articles
[Zeyu Gao]'s Articles
[Jinye Qu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Guo LY(郭凌月)]'s Articles
[Zeyu Gao]'s Articles
[Jinye Qu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Transformer-based_Spiking_Neural_Networks_for_Multimodal_Audio-Visual_Classification.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.