CASIA OpenIR  > 毕业生  > 博士学位论文
面向英语口语测试的发音错误检测和诊断技术研究
Alternative TitleResearch on Automatic Pronunciation Error Detection and Diagnosis for Spoken English Test
李宏言
Subtype工学博士
Thesis Advisor徐波
2011-03-14
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword发音错误检测 发音诊断 大人群口语测试 分类器模型 时长模型 错误规则网络 重音错误检测 Mispronunciation Detection Mispronunciation Diagnosis Large Scale Spoken Language Test Classifier Model Duration Model Mispronunciation Rule Network Stress Mispronunciation Detection
Abstract口语发音的自动检错和诊断是计算机辅助语言学习和测试领域的关键技术之一。本文在深入分析现有技术的发展现状和总结前人已有成果的基础上,面向大规模人群口语测试的应用背景,对口语发音的自动检错和诊断技术进行了系统的研究,其主要贡献和创新点是: 1)数据资源是发音评估、检错和诊断技术的研究基础。针对大规模语料数据的挖掘和利用,本文构建了面向发音错误检测和重音错误检测的多个数据集。同时,对发音检错和诊断系统的评价指标体系进行了深入的类比分析,为实验分析和算法比较提供了统一的标准。 2)针对发音错误检测中的错读问题,通过对传统的基于HMM框架的后验概率和GOP等方法的分析梳理,本文从假设检验和分类检测的角度出发,将多种分类器方法引入到发音检错领域,并提出了一系列新的发音检错方法,包括基于通用背景模型的高斯混合模型(GMM-UBM)方法、基于广义线性区分序列核的支持向量机(GLDS-SVM)方法、基于TRAP特征的神经网络(TRAP-NN)方法。针对具有线性核函数性质的GLDS-SVM方法,提出了一种基于多模型融合的发音模型训练策略,可以比较有效地解决大数据量条件下的模型训练问题和引入新数据后模型修正的问题。TRAP时频特征的引入提高了对发音质量的刻画精度,在通用型的单一化发音检错系统中,其对应的TRAP-NN方法取得了最优的性能,对于置换式错误集、故意式错误集和自然式错误集,其等错误率分别达到8.73%、14.17%和28.44%。 3)针对发音错误检测中的错读、漏读和添读问题,本文提出了广义发音空间(GPS)的概念,将音素的错读、漏读和添读现象纳入到统一的检错范畴。同时,通过对大规模连续语料的错误发音规律的统计归纳,提出了基于词相关规则网络的发音检错方法,规避了传统的通用规则方法的缺陷,并有利于诊断反馈信息的自动输出,同时也分析了词相关规则方法的局限性。实验表明,在针对特定区域人群的大规模语料的支撑下,相比GOP和分类器方法,基于词相关规则网络的检错方法能够获得更为优异的性能。 4)针对发音评估和检错中的时长信息利用问题,提出了基于上下文语境的改进时长置信度,将不同层次语境上下文的时长建模纳入到统一的框架。同时,根据语料的数据量和分布情况,使用回退(Backing-off)策略对时长模型的训练进行自动调整,并采用基于查找表(Look-up table)的离散化策略对时长的直方图分布规律进行建模。实验显示,随着上下文约束的加强,时长置信度对发音评估和错误检测的帮助作用将越显著,而词相关时长模型的效果尤其突出。 5)针对英语词重音的错误检测问题,在对元音央化(Vowel Centralization)和重音表象的分析基础上,提出了多种基于谱特征和分类器方法的元音品质分数,成为传统韵律特征的有益补充。同时,提出一种基于分组策略的重音模型训练方法,可以缓解重音和非重音样本分布不均衡的问题。提出一种基于发音变化网络的方法,提高了动-名词对(noun-verb stress pairs)词汇的重音指派位置错误的检测性能。实验表明,融合了多种策略的重音检错系统在专项测试集上的等错误率达到10.19%。
Other AbstractIn the area of computer assisted language learning and testing, automatic mispronunciation detection and diagnosis are the key techniques. Based on the deep analysis of technical aspects and existing achievements, in connection with the spoken English test for large scale crowd, this thesis will perform systemic researches on the mispronunciation detection and diagnosis technology, and the corresponding contributions and innovation highlights are summarized as follows: 1)Data resource is the basis for the study. Towards the excavation and usage of massive data, this thesis has constructed lots of special corpora, including the mispronunciation corpus, stress mispronunciation corpus, etc. Meanwhile, in order to provide a universal evaluation platform, the thesis has done a lot on the performance evaluation system. 2)Towards the problem of substitution mispronunciation detection, on the basis of traditional HMM based GOP, this thesis has opened a new path, that is, in terms of hypothesis testing and classification detection, many classifiers have been introduced into mispronunciation detection task, and a series of novel mispronunciation detection methods has been proposed, such as the UBM based GMM method (GMM-UBM), the GLDS kernel based SVM method (GLDS-SVM), TRAP feature based neural network method (TRAP-NN), and so on. For GLDS-SVM, the thesis proposed a new multi-model fusion strategy for model training, in order to make full use of samples and solve the problem of data unbalance. The introduction of TRAP improved the description ability of pronunciation quality, and TRAP-NN achieved the best performance in the current existing universal single mispronunciation detection systems, the EER values are 8.73%, 14.17% and 28.44% for simulation set, intended set and natural set, respectively. 3)Towards the problem of substitution, deletion and insertion mispronunciation detection, a concept of generalized pronunciation space has been proposed, and brought the various mispronunciation cases into a unified framework. Besides, through the summarization of mispronunciation patterns in high-volume corpus, the thesis utilized the word-dependent rules in mispronunciation detection network, in order to avoid the shortcomings of the conventional universal rules. The rule based method is in favor of the automatic generation of feedbacks, while its limitations are also obvious. The experimental results show, in the support of massive data from special districts, the ...
shelfnumXWLW1493
Other Identifier200718014628043
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6320
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
李宏言. 面向英语口语测试的发音错误检测和诊断技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20071801462804(2351KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李宏言]'s Articles
Baidu academic
Similar articles in Baidu academic
[李宏言]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李宏言]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.