CASIA OpenIR  > 复杂系统认知与决策实验室  > 听觉模型与认知计算
Mixspeech: Data augmentation for low-resource automatic speech recognition
Meng Linghui1,2; Xu Jin; Tan Xu; Wang Jindong; Qin Tao; Xu Bo1,2
2021-06
Conference NameIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021.
Conference Date2021.6.6-2021.6.11
Conference PlaceToronto, Canada
Abstract

In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR). MixSpeech trains an ASR model by taking a weighted combination of two different speech features (e.g., mel-spectrograms or MFCC) as the input, and recognizing both text sequences, where the two recognition losses use the same combination weight. We apply MixSpeech on two popular end-to-end speech recognition models including LAS (Listen, Attend and Spell) and Transformer, and conduct experiments on several low-resource datasets including TIMIT, WSJ, and HKUST. Experimental results show that MixSpeech achieves better accuracy than the baseline models without data augmentation, and outperforms a strong data augmentation method SpecAugment on these recognition tasks. Specifically, MixSpeech outperforms SpecAugment with a relative PER improvement of 10.6% on TIMIT dataset, and achieves a strong WER of 4.7% on WSJ dataset.

Sub direction classification语音识别与合成
planning direction of the national heavy laboratory人机混合智能
Paper associated data
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57334
Collection复杂系统认知与决策实验室_听觉模型与认知计算
Affiliation1.Institute of Automation, Chinese Academy of Sciences
2.School of Artificial Intelligence, University of Chinese Academy of Sciences
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Meng Linghui,Xu Jin,Tan Xu,et al. Mixspeech: Data augmentation for low-resource automatic speech recognition[C],2021.
Files in This Item: Download All
File Name/Size DocType Version Access License
mixspeech_full_paper(1111KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Meng Linghui]'s Articles
[Xu Jin]'s Articles
[Tan Xu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Meng Linghui]'s Articles
[Xu Jin]'s Articles
[Tan Xu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Meng Linghui]'s Articles
[Xu Jin]'s Articles
[Tan Xu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: mixspeech_full_paper.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.