CASIA OpenIR  > 毕业生  > 博士学位论文
多语言语音识别技术研究
其他题名Research of Multilingual Speech Recognition
于胜民
学位类型工学博士
导师徐波
2005-05-01
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词语音识别 汉语识别 英语识别 日语识别 多语言识别 Speech Recognition Csr Esr Jsr Multilingual Bilingual
摘要论文工作的主要内容和贡献如下: 1、深入分析了汉语语音识别的各项实现技术,如特征提取,决策树建模和识别器的搜索框架等。从语境相关建模和声学特征两个方面详细研究了声调信息对汉语识别系统的影响。此外还以音素为建模单元,重新搭建了一个汉语识别系统,从反面验证了声韵母建模的优势。 2、深入分析了英语的语言特点,详细考察了主流的英语语音识别技术,开发出英语识别系统,包括初始模型的生成、问题集的设计、基于决策树的三音子模型训练和识别搜索过程。在方差建模技术中引入了贝叶斯准则用于确定方差变换类别的个数。采用对数谱域的特征补偿算法,在不影响纯净语音识别效果的情况下提高了系统的抗噪性能。此外,还采用数据驱动的 MLLR 算法对非母语发音的口音自适应问题进行了研究。 3、深入分析了日语的发音和语言特征,定义了日语的声学基本建模单元,采用基于决策树的三音子建模方法,快速开发出我们的日语语音识别系统。提出了基于统计方法的端点检测算法,从统计学的观点出发估计端点的门限,具有较为鲁棒的抗噪性能。此外,还针对跨语言识别的方法,考察了从汉语、英语和汉英双语到日语的跨语言识别,给出了一些初步的实验结果。4、多语言语音识别的一个难点就是如何有效控制识别单元扩大带来的建模单元急剧增加的问题。我们以汉语和英语为研究对象,详细研究了汉英双语的混合声学建模问题。从直接合并汉英双语的建模单元到 IPA 映射,再到基于不同距离度量(Bhattacharyya 距离,似然度距离和最大互信息距离)的自动聚类算法,考察了各种方法的优缺点,探索出一条双语建模的有效途径。引入语言有关的问题,进一步改进了普通的决策树建模算法,使得问题的分裂更容易进行下去,对声学建模的精确性有一定的提高。
其他摘要The main works of this paper are as follows. Developed Chinese Speech Recognition (CSR) system with phoneme as the base model, based on detailed study of our CSR technologies. Importance of Chinese tones information was showed from two aspects of speech recognition, such as feature and model. Characteristics of English were intensive studied first, and then our English Speech Recognition (ESR) system was developed, including initial model training and design of question set and decision tree based triphone model training and search process of the recognizer. Then semi-tied covariance modeling techniques are improved using more robust Bayes information as the criterior of deciding the number of covariance transformation matrix. The compensation in the log-spectral domain is also investigated to gain more robust acoustic model. At last, nonnative speaker adaptation was tested by data driven maximum likelihood linear regression (MLLR) fast adaptation algorithm. Japanese Speech Recognition (JSR) system was developed rapidly with fast bootstrapping method of MSR. Then end-point detection algorithm based on statistics is suggested. This algorithm is more robust for noisy speech than others. At last, simple tests of cross-language speech recognition from Chinese and English and Chinese-English bilingual system to Japanese were carried out. The results showed that bilingual acoustic model performed better than language-dependent models. Several Chinese-English bilingual acoustic modeling techniques were explored intensively, such as direct combination of two sets of base model and IPA mapping and automatic agglomerative clustering by different distance measures, e.g., Bhattacharyya distance, log-likelihood and maximum mutual information (MMI). Language related questions were adopted in the decision tree training process and achieved higher performance than the traditional method.
馆藏号XWLW930
其他标识符200118014604898
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/5866
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
于胜民. 多语言语音识别技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2005.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[于胜民]的文章
百度学术
百度学术中相似的文章
[于胜民]的文章
必应学术
必应学术中相似的文章
[于胜民]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。