Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification
Guo LY(郭凌月)1,2; Zeyu Gao1,2; Jinye Qu1,2; Suiwu Zheng1,2; Runhao Jiang3; Yanfeng Lu1,2; Hong Qiao1,2
发表期刊IEEE Transactions on Cognitive and Developmental Systems
2023
页码DOI 10.1109/TCDS.2023.3327081
摘要

The spiking neural networks (SNNs), as brain- inspired neural networks, have received noteworthy attention due to their advantages of low power consumption, high parallelism, and high fault tolerance. While SNNs have shown promising results in uni-modal data tasks, their deployment in multi-modal audiovisual classification remains limited, and the effectiveness of capturing correlations between visual and audio modalities in SNNs needs improvement. To address these challenges, we propose a novel model called Spiking Multi-Model Transformer (SMMT) that combines SNNs and Transformers for multi-modal audiovisual classification. The SMMT model integrates uni- modal sub-networks for visual and auditory modalities with a novel Spiking Cross-Attention module for fusion, enhancing the correlation between visual and audio modalities. This approach leads to competitive accuracy in multi-modal classification tasks with low energy consumption, making it an effective and energy- efficient solution. Extensive experiments on a public event-based dataset(N-TIDIGIT&MNIST-DVS) and two self-made audiovisual datasets of real-world objects(CIFAR10-AV and UrbanSound8K- AV) demonstrate the effectiveness and energy efficiency of the proposed SMMT model in multi-modal audio-visual classification tasks. Our constructed multi-modal audiovisual datasets can be accessed at https://github.com/Guo-Lingyue/SMMT.

收录类别SCI
七大方向——子方向分类类脑模型与计算
国重实验室规划方向分类脑启发多模态智能模型与算法
是否有论文关联数据集需要存交
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/56541
专题多模态人工智能系统全国重点实验室
通讯作者Yanfeng Lu
作者单位1.the State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Science (CASIA)
2.the University of Chinese Academy of Sciences (UCAS)
3.College of Computer Science and Technology, Zhejiang University
第一作者单位中国科学院自动化研究所
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Guo LY,Zeyu Gao,Jinye Qu,et al. Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification[J]. IEEE Transactions on Cognitive and Developmental Systems,2023:DOI 10.1109/TCDS.2023.3327081.
APA Guo LY.,Zeyu Gao.,Jinye Qu.,Suiwu Zheng.,Runhao Jiang.,...&Hong Qiao.(2023).Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification.IEEE Transactions on Cognitive and Developmental Systems,DOI 10.1109/TCDS.2023.3327081.
MLA Guo LY,et al."Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification".IEEE Transactions on Cognitive and Developmental Systems (2023):DOI 10.1109/TCDS.2023.3327081.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Transformer-based_Sp(3922KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Guo LY(郭凌月)]的文章
[Zeyu Gao]的文章
[Jinye Qu]的文章
百度学术
百度学术中相似的文章
[Guo LY(郭凌月)]的文章
[Zeyu Gao]的文章
[Jinye Qu]的文章
必应学术
必应学术中相似的文章
[Guo LY(郭凌月)]的文章
[Zeyu Gao]的文章
[Jinye Qu]的文章
相关权益政策
暂无数据
收藏/分享
文件名: Transformer-based_Spiking_Neural_Networks_for_Multimodal_Audio-Visual_Classification.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。