Integrating an external language model into a sequence-to-
sequence speech recognition system is non-trivial. Previous
works utilize linear interpolation or a fusion network to inte-
grate external language models. However, these approaches in-
troduce external components, and increase decoding computa-
tion. In this paper, we instead propose a knowledge distilla-
tion based training approach to integrating external language
models into a sequence-to-sequence model. A recurrent neural
network language model, which is trained on large scale exter-
nal text, generates soft labels to guide the sequence-to-sequence
model training. Thus, the language model plays the role of the
teacher. This approach does not add any external component
to the sequence-to-sequence model during testing. And this
approach is flexible to be combined with shallow fusion tech-
nique together for decoding. The experiments are conducted
on public Chinese datasets AISHELL-1 and CLMAD. Our ap-
proach achieves a character error rate of 9.3%, which is rela-
tively reduced by 18.42% compared with the vanilla sequence-
to-sequence model.
Ye Bai,Jiangyan Yi,Jianhua Tao,et al. Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition[C]:Interspeech,2019.
修改评论