CASIA OpenIR  > 毕业生  > 博士学位论文
汉语广播语音识别系统的研究_
其他题名Research on Chinese Broadcast News Recognition System
穆向禹
学位类型工学博士
导师徐波 ; 张树武
2005-05-01
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词广播语音识别系统 音频分割 说话人自适应 方差建模技术 Broadcast News Continuous Speech Recognition Audio Data Segmentation
摘要广播语音识别技术的研究是当前大词汇量连续语音研究的一个热点问题。广播电视新闻节目包括了说话人、方言口音、声道变化、声学环境的一系列声学的复杂特征,对于语音技术的实用化研究是一种理想的研究对象,对于语音技术走向实用化方面有着重大的意义。本文针对广播语音识别系统中的关键问题,在以下几个方面进行了广泛和深入的研究。首先,在广播语音的音频数据切分方面,本文提出了一种基于检测熵变化趋势的变窗长音频特征跳变点检测方法。本文提出的方法在一个固定的数据窗内,通过检测窗内所有可能跳变点的熵的变化趋势来最终确定真实的音频跳变点。这种方法不同于传统的基于BIC准则的音频跳变点检测方法,避免了由于设定固定门限而导致引起漏检和数据积累带来的累积误差。在分类的过程中,用分组高斯方法代替传统的高斯混合模型(GMM)分类器,取得了更加准确的分类结果,实现了快速高效的基于矢量量化的多码本聚类算法。其次,在广播语音识别系统的自适应训练方面,本文提出了一种基于子空间聚类的多层MLLR自适应算法,这种算法在子空间框架下对高斯模型进行聚类,基于目标驱动的原则,通过引入反馈机制,根据自适应数据的似然概率的增加动态的决定自适应变换类的数目。通过采取子空间聚类的策略,大大减少了待估计参数的数目。实验结果表明,本方法在自适应数据比较少的情况下,有着比传统基于自适应回归树算法更高的识别率。在无监督自适应方面,本文对可信度机制做了一些探讨,通过合理的引入可信度机制可以提高系统无监督自适应的性能。最后,在广播语音的声学建模方面,针对现有对角方差建模的缺陷与不足,在空间旋转变换的理论基础上,结合部分方差共享(STC)的模型补偿方法,本文提出一种基于共享状态空间旋转变换的相关特征建模方法(Tying SSR)。通过状态空间旋转变换方法(SSR),在变换后的新的特征空间实现解相关的目的,在新的不相关的空间采用对角方差建模技术对声学特征进行精确建模。以似然概率损失最小为原则,对变换矩阵进行合并共享,通过BIC准则方法确定最终的合并类数,最后用部分方差共享技术对变换矩阵的参数进行模型补偿和重估。避免了由于变换矩阵过多,在识别解码阶段增加系统的存储空间和运算量的增加。
其他摘要Broadcast news continuous speech recognition is a hot question. The data in broadcasts are not homogeneous, and include a series of acoustical characteristic such as speaker styles, dialect and accent, channel variety, and acoustical environments. Broadcasts are a perfectly research object for Study practically about speech technology. Aim at several key problems of broadcast news speech recognition, I present the recent progress on improving the performance for mandarin broadcast news speech recognition system. Firstly, a novel method for acoustic change point detection is proposed. The method detects change point by checking entropy change trend of all signal points in slide shifting variable-size data windows. Different from traditional detect method based on Bayesian Information Criterion (BIC), fail to detect and error accumulation are avoided for establishing fixed threshold in the checking entropy change methods. On the classification part, traditional GMM is replaced by Component Group GMM (CG_GMM), more accurate classification result is gotten. Secondly, multi-layer structure MLLR adaptation algorithm with subspace regression classes (SRCMLLR) and tying is proposed. The method groups the Gaussians on a finer acoustic subspace level, the multi-layer structure generates a regression class dynamically for each subspace using the outcome of the former MLLR transformation. There are fewer parameters to be estimated for the subsequent MLLR transformation matrix by adopting subspace clustering strategy. Experiments in large vocabulary mandarin speech recognition show the advantages of SRCMLLR over the traditional MLLR while the amount adaptation data are scarce. Confidence measure is discussed for unsupervised adaptation mode. When the confidence measure is applied to eliminate the unreliable results, the performance of unsupervised mode is improved.Thirdly, a method based on state-specified rotation (SSR) transform and semi-tied covariance transform (STC) is proposed to model correlations between feature coefficients, which we call tying SSR. In the method, SSR transform is used to remove the correlations of elements of the feature vector in each state, refined acoustic model is generated in an uncorrelated new feature space. A tying method using the principle that the least decrease of auxiliary function is adopted, and Bayesian Information Criterion is used for choosing the number of tied class. In the end, semi-tied covariance transform is adopted for updating parameters of the newly tied transform class. The methods overcomes shortcoming because of the acoustic model generated by SSR has much computation load.
馆藏号XWLW927
其他标识符200118014604879
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/5863
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
穆向禹. 汉语广播语音识别系统的研究_[D]. 中国科学院自动化研究所. 中国科学院研究生院,2005.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[穆向禹]的文章
百度学术
百度学术中相似的文章
[穆向禹]的文章
必应学术
必应学术中相似的文章
[穆向禹]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。