CASIA OpenIR  > 复杂系统认知与决策实验室  > 听觉模型与认知计算
P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification
Wang XY(王溪源)1; Wang FY(王方圆)1; Xu B(徐波)1; Xu L(徐亮)2; Xiao J(肖京)3
2023
Conference NameINTERSPEECH 2023
Conference Date2023.08.24
Conference PlaceDublin, Ireland
Abstract

Typically, the Time-Delay Neural Network (TDNN) and Transformer
can serve as a backbone for Speaker Verification (SV). Both of them have advantages and disadvantages from the perspective of global and local feature modeling. How to effectively integrate these two style features is still an open issue. In this paper, we explore a Parallel-coupled TDNN/Transformer Network (p-vectors) to replace the serial hybrid networks. The p-vectors allows TDNN and Transformer to learn the complementary information from each other through Soft Feature
Alignment Interaction (SFAI) under the premise of preserving local and global features. Also, p-vectors uses the Spatial Frequency-channel Attention (SFA) to enhance the spatial interdependence modeling for input features. Finally, the outputs of dual branches of p-vectors are combined by Embedding Aggregation Layer (EAL). Experiments1 show that p-vectors outperforms MACCIF-TDNN and MFA-Conformer with relative improvements of 11.5% and 13.9% in EER on VoxCeleb1-O.

Sub direction classification语音识别与合成
planning direction of the national heavy laboratory语音语言处理
Paper associated data
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57381
Collection复杂系统认知与决策实验室_听觉模型与认知计算
Corresponding AuthorXu B(徐波)
Affiliation1.中国科学院自动化研究所
2.金融一账通
3.平安科技
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Corresponding Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Wang XY,Wang FY,Xu B,et al. P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification[C],2023.
Files in This Item: Download All
File Name/Size DocType Version Access License
wang23i_interspeech_(1542KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang XY(王溪源)]'s Articles
[Wang FY(王方圆)]'s Articles
[Xu B(徐波)]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang XY(王溪源)]'s Articles
[Wang FY(王方圆)]'s Articles
[Xu B(徐波)]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang XY(王溪源)]'s Articles
[Wang FY(王方圆)]'s Articles
[Xu B(徐波)]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: wang23i_interspeech_published.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.