CASIA OpenIR  > 毕业生  > 博士学位论文
多语言单词字音转换的研究
其他题名Research on Multilingual Grapheme-to-Phoneme Conversion
李鹏
学位类型工学博士
导师徐波
2008-05-20
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词字音转换 决策树 随机森林 Grapheme-to-phoneme Conversion G2p Decision Trees Random Forests Adaboost
摘要在语音识别和语音合成的应用系统中,经常会遇到发音词典中没有的单词,因此需要提供一个模块自动的为这类单词注音,这个任务称为单词的字音转换(grapheme-to-phoneme conversion)。在几十年的研究历史中,研究者试图从两个方向解决这个问题,即利用基于专家知识的手写规则方法和数据驱动的基于机器学习的方法。近年来的实践表明,后者在转换准确性、语言独立性等方面都超过了前者,但是对于英语这样的发音规律性很差的语言,现有的方法还不能达到满意的性能。本文针对字母语言的单词字音转换问题做了细致深入的研究,主要贡献和创新点归纳如下: 1. 改进了基于决策树的字音转换方法。在已经提出的诸多基于机器学习的方法中,基于决策树的方法获得了很好的效果,但是现有文献中缺乏对实现中一些关键因素的讨论。本文通过实验分析了这些因素对系统整体性能的影响,证明通过细致的调节,可以大幅提高字音转换的准确率。另外,还提出了两个新的方法,解决了词典的字音对齐和快速寻找最优剪枝参数的问题。 2. 提出了基于Bagging和随机森林的字音转换方法。决策树方法虽然可以很好的描述训练数据,但是泛化能力有限:泛化错误可以分解为模型的偏倚和方差,单一的决策树无法同时降低这两部分。Bagging和随机森林都属于聚合分类器,它们通过在训练过程中引入随机因素,使用相同的训练数据得到许多不同的决策树分类器,将它们的分类结果投票产生最后的输出,同时降低了偏倚和方差,因而降低了泛化错误率。实验证明,使用这两个方法可以取得明显优于决策树的字音转换准确率。 3. 提出了基于AdaBoost的字音转换方法。AdaBoost方法通过对训练样本加权,根据分类错误调节权重,迭代训练若干分类器,最后将这些分类器的结果加权投票产生最终的分类结果。AdaBoost使用自适应调节权重的方法,使分类器更侧重于对分类错误率高的训练样本进行分类,通过投票的方式可以将所谓的“弱分类器”组合为“强分类器”,得到很好的分类能力。本文提出的基于AdaBoost的字音转换方法也取得了比决策树方法更高转换准确率。 4. 将本文提出的几种方法整合为一个融合系统,在NETtalk和CMU两个英文词典测试集上得到的转换准确率高于已发表文献中的最高水平。
其他摘要In applications of speech recognition and synthesis, OOV (out-of-vocabulary) words are often encountered, so there should be a module to perform the automatic grapheme-to-phoneme (G2P) conversion. In the past decades, there exist two categories of solutions, namely the expertise knowledge based manual-written-rules methods and the data driven machine learning methods. In recent years, it is demonstrated that the latter outperforms the former in both conversion accuracy and language independency. But for the very irregular languages such as English, the current methods still can’t achieve satisfied performances. In this thesis, we delve into this area, and obtain significant improvements. The main contributions and novelties include: 1. The improvement of a decision trees based G2P conversion system. Decision trees based G2P conversion systems have achieved best performances among all machine learning based systems, but there are some key issues lacks of discussions in the literature. We analyzed these issues by experiments, and concluded that by carefully adjusting the settings, the G2P conversion accuracy can be improved a lot. We also proposed two new methods for lexicon alignment and for fast tree pruning in this work. 2. Two new G2P conversion methods based on bagging (bootstrap aggregating) and random forests are proposed. Although decision trees can model the training data well, their capability of generalization is limited: the generalization error can be decomposed to bias and variance, and the decision trees cannot decrease both of them at one time. Bagging and random forests are ensemble classifiers which create different decision trees using the same training data by introducing randomness in the training procedure. The classification result of the ensemble classifier is obtained by voting the results of all the decision trees, thus the bias and variance are reduced simultaneously and hence the generalization error is reduced. Experiments proved that the new methods outperform the decision tree based method significantly. 3. A new G2P conversion method based on AdaBoost is proposed. AdaBoost is another ensemble classifier which adaptively adjusts the weight of each training sample, and makes the new classifier concentrates on the samples that are hard to be correctly classified. The adjustments of sample weights are directed by the misclassification of the last classifier, and the new classifier is trained iteratively. By weighted voting of all the classifiers, AdaBoost can turn the so called “weak classifiers” to “strong classifiers”, and are successfully used in face detection systems. The AdaBoost based G2P conversion system presented in this thesis also obtained better results than the decision trees based method.
馆藏号XWLW1194
其他标识符200418014628081
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6058
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
李鹏. 多语言单词字音转换的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2008.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20041801462808(1444KB) 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[李鹏]的文章
百度学术
百度学术中相似的文章
[李鹏]的文章
必应学术
必应学术中相似的文章
[李鹏]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。