面向英语口语测试的发音错误检测和诊断技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向英语口语测试的发音错误检测和诊断技术研究
其他题名	Research on Automatic Pronunciation Error Detection and Diagnosis for Spoken English Test
	李宏言
	2011-03-14
学位类型	工学博士
中文摘要	口语发音的自动检错和诊断是计算机辅助语言学习和测试领域的关键技术之一。本文在深入分析现有技术的发展现状和总结前人已有成果的基础上，面向大规模人群口语测试的应用背景，对口语发音的自动检错和诊断技术进行了系统的研究，其主要贡献和创新点是: 1)数据资源是发音评估、检错和诊断技术的研究基础。针对大规模语料数据的挖掘和利用，本文构建了面向发音错误检测和重音错误检测的多个数据集。同时，对发音检错和诊断系统的评价指标体系进行了深入的类比分析，为实验分析和算法比较提供了统一的标准。 2)针对发音错误检测中的错读问题，通过对传统的基于HMM框架的后验概率和GOP等方法的分析梳理，本文从假设检验和分类检测的角度出发，将多种分类器方法引入到发音检错领域，并提出了一系列新的发音检错方法，包括基于通用背景模型的高斯混合模型（GMM-UBM）方法、基于广义线性区分序列核的支持向量机（GLDS-SVM）方法、基于TRAP特征的神经网络（TRAP-NN）方法。针对具有线性核函数性质的GLDS-SVM方法，提出了一种基于多模型融合的发音模型训练策略，可以比较有效地解决大数据量条件下的模型训练问题和引入新数据后模型修正的问题。TRAP时频特征的引入提高了对发音质量的刻画精度，在通用型的单一化发音检错系统中，其对应的TRAP-NN方法取得了最优的性能，对于置换式错误集、故意式错误集和自然式错误集，其等错误率分别达到8.73%、14.17%和28.44%。 3)针对发音错误检测中的错读、漏读和添读问题，本文提出了广义发音空间（GPS）的概念，将音素的错读、漏读和添读现象纳入到统一的检错范畴。同时，通过对大规模连续语料的错误发音规律的统计归纳，提出了基于词相关规则网络的发音检错方法，规避了传统的通用规则方法的缺陷，并有利于诊断反馈信息的自动输出，同时也分析了词相关规则方法的局限性。实验表明，在针对特定区域人群的大规模语料的支撑下，相比GOP和分类器方法，基于词相关规则网络的检错方法能够获得更为优异的性能。 4)针对发音评估和检错中的时长信息利用问题，提出了基于上下文语境的改进时长置信度，将不同层次语境上下文的时长建模纳入到统一的框架。同时，根据语料的数据量和分布情况，使用回退（Backing-off）策略对时长模型的训练进行自动调整，并采用基于查找表（Look-up table）的离散化策略对时长的直方图分布规律进行建模。实验显示，随着上下文约束的加强，时长置信度对发音评估和错误检测的帮助作用将越显著，而词相关时长模型的效果尤其突出。 5)针对英语词重音的错误检测问题，在对元音央化（Vowel Centralization）和重音表象的分析基础上，提出了多种基于谱特征和分类器方法的元音品质分数，成为传统韵律特征的有益补充。同时，提出一种基于分组策略的重音模型训练方法，可以缓解重音和非重音样本分布不均衡的问题。提出一种基于发音变化网络的方法，提高了动-名词对（noun-verb stress pairs）词汇的重音指派位置错误的检测性能。实验表明，融合了多种策略的重音检错系统在专项测试集上的等错误率达到10.19%。
英文摘要	In the area of computer assisted language learning and testing, automatic mispronunciation detection and diagnosis are the key techniques. Based on the deep analysis of technical aspects and existing achievements, in connection with the spoken English test for large scale crowd, this thesis will perform systemic researches on the mispronunciation detection and diagnosis technology, and the corresponding contributions and innovation highlights are summarized as follows: 1)Data resource is the basis for the study. Towards the excavation and usage of massive data, this thesis has constructed lots of special corpora, including the mispronunciation corpus, stress mispronunciation corpus, etc. Meanwhile, in order to provide a universal evaluation platform, the thesis has done a lot on the performance evaluation system. 2)Towards the problem of substitution mispronunciation detection, on the basis of traditional HMM based GOP, this thesis has opened a new path, that is, in terms of hypothesis testing and classification detection, many classifiers have been introduced into mispronunciation detection task, and a series of novel mispronunciation detection methods has been proposed, such as the UBM based GMM method (GMM-UBM), the GLDS kernel based SVM method (GLDS-SVM), TRAP feature based neural network method (TRAP-NN), and so on. For GLDS-SVM, the thesis proposed a new multi-model fusion strategy for model training, in order to make full use of samples and solve the problem of data unbalance. The introduction of TRAP improved the description ability of pronunciation quality, and TRAP-NN achieved the best performance in the current existing universal single mispronunciation detection systems, the EER values are 8.73%, 14.17% and 28.44% for simulation set, intended set and natural set, respectively. 3)Towards the problem of substitution, deletion and insertion mispronunciation detection, a concept of generalized pronunciation space has been proposed, and brought the various mispronunciation cases into a unified framework. Besides, through the summarization of mispronunciation patterns in high-volume corpus, the thesis utilized the word-dependent rules in mispronunciation detection network, in order to avoid the shortcomings of the conventional universal rules. The rule based method is in favor of the automatic generation of feedbacks, while its limitations are also obvious. The experimental results show, in the support of massive data from special districts, the ...
关键词	发音错误检测发音诊断大人群口语测试分类器模型时长模型错误规则网络重音错误检测 Mispronunciation Detection Mispronunciation Diagnosis Large Scale Spoken Language Test Classifier Model Duration Model Mispronunciation Rule Network Stress Mispronunciation Detection
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6320
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李宏言. 面向英语口语测试的发音错误检测和诊断技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20071801462804（2351KB）			暂不开放	CC BY-NC-SA