Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Fan ZY(范志赟)1,2; Dong LH(董林昊)3; Cai M(蔡猛)3; Ma ZJ(马泽君)3; Xu B(徐波)1
发表期刊Signal Processing Letters
2022
页码1551-1554
摘要

Speaker change detection is an important task in multi-party interactions such as meetings and conversations. In this paper, we address the speaker change detection task from the perspective of sequence transduction. Specifically, we propose a novel encoder-decoder framework that directly converts the input feature sequence to the speaker identity sequence. The difference-based continuous integrate-and-fire mechanism is designed to support this framework. It detects speaker changes by integrating the speaker difference between the encoder outputs frame-by-frame and transfers encoder outputs to segment-level speaker embeddings according to the detected speaker changes. The whole framework is supervised by the speaker identity sequence, a weaker label than the precise speaker change points. The experiments on the AMI and DIHARD-I corpora show that our sequence-level method consistently outperforms a strong frame-level baseline that uses the precise speaker change labels.

学科门类工学
DOI10.1109/LSP.2022.3185955
收录类别SCI
七大方向——子方向分类语音识别与合成
国重实验室规划方向分类语音语言处理
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/49731
专题复杂系统认知与决策实验室_听觉模型与认知计算
作者单位1.Institute of Automation, Chinese Academy of Sciences, China
2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
3.Bytedance AI LAB
第一作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Fan ZY,Dong LH,Cai M,et al. Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire[J]. Signal Processing Letters,2022:1551-1554.
APA Fan ZY,Dong LH,Cai M,Ma ZJ,&Xu B.(2022).Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire.Signal Processing Letters,1551-1554.
MLA Fan ZY,et al."Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire".Signal Processing Letters (2022):1551-1554.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Sequence-Level_Speak(404KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Fan ZY(范志赟)]的文章
[Dong LH(董林昊)]的文章
[Cai M(蔡猛)]的文章
百度学术
百度学术中相似的文章
[Fan ZY(范志赟)]的文章
[Dong LH(董林昊)]的文章
[Cai M(蔡猛)]的文章
必应学术
必应学术中相似的文章
[Fan ZY(范志赟)]的文章
[Dong LH(董林昊)]的文章
[Cai M(蔡猛)]的文章
相关权益政策
暂无数据
收藏/分享
文件名: Sequence-Level_Speaker_Change_Detection_With_Difference-Based_Continuous_Integrate-and-Fire.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。