汉语CALL系统声调语调评估诊断技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	汉语CALL系统声调语调评估诊断技术研究
其他题名	Diagnosis and Evaluation of Tone and Intonation in Chinese CALL System
	柯登峰
	2009-06-30
学位类型	工学博士
中文摘要	近年来，语音技术在汉语计算机辅助语言学习中的应用掀起了新的热潮。然而，大多数专家和学者主要致力于发音评估算法的研究，而作为同样不可忽略的重要因素的声调和语调，却常常被采用显示基频曲线的方式所代替。本文围绕声调和语调这两种超音段特征，分别进行评估和诊断两方面的研究，获得一套行之有效的声调和语调的评估和诊断方法。 1. 在基频提取上，本文提出一种基于双层DP的改进算法。实践中发现，嗓音沙哑人群对基频提取性能影响较大，经常产生大量的半频音。分析表明，半频音产生的原因在于频谱中的伪谐波成分，这种伪谐波通常高于第一谐波。对此，本文提出一种基于双层DP的改进算法，有效地解决了嗓音沙哑人群基频提取问题。 2. 在声调评估技术上，本文提出一种融合多特征的声调评估算法。①为了准确获得连续语音的声韵母边界，本文提出建立噪音模型、采用小语言模型和强制切分等三个策略。②因研究发现朗读的正确率和声调评分有较强相关性，本文提出利用朗读正确率对声调得分进行加权的方法。字符串匹配技术被用于计算朗读正确率，同时还可以用于拒绝无效语音片段。③因不同声调具有不同的分布特点，本文在分析各个声调偏度的基础上，提出利用偏度改进声调评估性能的方法。④此外，本文还利用停顿的合理性来改善评估的性能。通过将上述所有方法进行融合，最终形成了一套行之有效的声调评估方案。 3. 在声调诊断技术上，本文提出一种基于聚类的声调诊断方法。传统的方法通常采用四声识别代替声调诊断，其主要问题是：采用四声模型无法诊断超出四声的现象，重口音条件下声调识别率较低，标注人员标注超四声声调时的一致性较低。针对这种情况，本文提出了基于聚类的声调诊断方法。结果表明，该方法在单字调诊断上可达到人工的标注水平，在双字调诊断上甚至远远超过了人工标注的水平。 4. 在语调评估技术上，本文提出一种新的语调特征——排序误差矢量。语调评估的难点在于语调的多变性，同一个文本根据表达语义的不同可以有多种不同的语调，而语义是目前计算机所无法理解的内容。因此，本文从句子跟读的角度出发，研究了一种新的语调特征。该特征具有固定维数，从而解决了句子长短不一的问题。与传统特征相比，该特征的错误率更低，且更不易受到基频提取性能的影响。 5. 在语调诊断技术上，本文分别研究了句子重音检测和句子语气识别算法。汉语的语调比英语的语调更加复杂，国内专家对语调的理解还没有达成完全统一意见。经分析各专家的观点，作者认为重音和语气是保证语调正确性的两个最重要因素。只有重音和语气都正确，语调才正确。因此，语调的诊断问题可分为重音检测和语气识别两个问题进行研究。通过逐步分析重音和语气特征，本文找到了行之有效的重音检测和语气识别的方法，较好地完成了语调诊断的任务。
英文摘要	In recent years, the application of speech technology has set off a new upsurge in computer-assisted language learning (CALL) systems. However, the majority of experts and scholars mainly focus on the study of pronunciation assessment, leaving out the assessment of two significant factors, tone and intonation, by means of displaying pitch curve. In this paper, the assessment and diagnosis of tone and intonation in Chinese CALL system will be discussed respectively. 1. A two-layer DP based algorithm is proposed for pitch extraction. The hoarse voice has great negative impact on the fundamental frequency extraction by causing large amounts of halving frequencies. Analyses show that halving frequencies are coming from pseudo harmonies in spectrogram that is normally higher than the first harmony. The two-layer DP based algorithm is proposed for pitch extraction, and effectively resolves the problem of extracting fundamental frequency in the case of hoarse voice quality. 2. A multi-feature fusion algorithm is established for tone assessment. In order to extract accurate initial/final boundaries of continuous speech, three strategies are adopted - building the noise model, using a small language model and doing force-alignment. As accuracy of reading has strong correlation with the score of tone assessment in research, a string matching technique is used to refuse invalid speech segments, and the accuracy of reading is used to weight the tonal score produced by computer. As different tones have individual distribution characteristics, skewness is also used to improve the performance of tone assessment. Meanwhile, rationality of pause position is helpful for improving the performance of tone assessment. The above techniques together form a set of effective performance scheme of tone assessment. 3. A cluster-based tone diagnosis method is exploited for tone diagnosis. Traditionally, experts mainly adopt four-tone recognition rather than tone diagnosis, and problem of which are that no phenomenon beyond the four tones can be diagnosed by the four- tone model. Tone recognition accuracy under strong accents is relatively low, and the consistency of labeling the same corpus by different people is very low as well. Therefore, the cluster-based method is used to solve these problems, with the same manual tagging level on diagnosing mono-syllable tones and a much higher level on diagnosing di-syllable tones. 4. A special kind of feature called Sorted Error Vector ...
关键词	计算机辅助语言学些韵律声调语调评估诊断 Call Prosody Tone Intonation Assessment Diagnosis
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6223
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	柯登峰. 汉语CALL系统声调语调评估诊断技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2009.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20051801462808（2325KB）			暂不开放	CC BY-NC-SA