CASIA OpenIR  > 毕业生  > 博士学位论文
汉语广播电视新闻语音识别
其他题名TRANSCRIPTION OF CHINESE BROADCAST NEWS OVER RADIO AND TV
贾磊
2003-07-01
学位类型工学博士
中文摘要目前,以汉语广播电视新闻语音为对象的语音识别技术的研究,对于语音识 别的实用化发展具有非常重要的意义。本文针对汉语广播电视新闻语音识别的 技术难点,主要进行了以下几个方面广泛而深入的研究。 首先:在广播电视新闻语音的连续语音分割方面,本文提出基于检测熵变换 趋势的音频特征跳变点检测方法,用来检测连续音频信号中的声学特征发生改 变的地方。这种方法是根据一段数据窗内的每一个可能的声学特征跳变点所分 割的两段语音信号的熵的变化趋势来确定声学特征跳变点。相比于国际上通用 的基于BIC准则的声学特征跳变点检测方法,本文提出的音频特征跳变点检测 方法具有较高的灵敏度和较鲁棒的检测门限,可以较好的适应各种场合的声学 特征跳变点的检测。 其次:在广播电视新闻语音识别的自适应方面,本文详细分析了国际上通用 的各种广播电视新闻识别系统的自适应算法的优点和缺点。特别针对基于自适 应回归树的MLLR算法需要依靠先验知识来决定自适应变换类的这一缺点和不 足,提出一种基于目标驱动的多层自适应算法。这种自适应算法能够根据自适 应数据的似然概率的增加来动态的决定自适应变换类的种类和数目,可以更加 充分的利用有限的自适应数据进行自适应,提高系统的识别率。 最后:在广播电视新闻的连续语音识别方面,本文基于现有的广播电视新闻 的连续语音识别系统在处理相关性特征建模这个问题上的缺点和不足,结合特 征层的线性旋转变换和模型层的方差建模技术,提出一种用方差建模技术来实 现的共享状态空间旋转变换矩阵的相关特征建模方法。这种方法利用基于状态 的旋转变换方法的解相关作用,在变换后的不相关的特征空间上建立高精度的 具有对角方差结构的混合高斯模型。同时又利用方差建模技术来进行状态空间 变换矩阵的参数共享和参数优化,克服了基于状态的特征旋转变换方法所导致 的模型参数数目过多、解码时计算量较大的缺点。在汉语普通话连续语音测试 和广播语音的连续语音测试中,本文提出的相关性特征建模方法在增加较少的 内存占用量和解码计算量的情况下,能够比采用传统的具有对角方差结构的高 斯混合模型的建模方法获得20%的相对误识率的降低。
英文摘要In the past ten years, great progress has been made in the state-of-the-art laboratory speech-recognition system. Recently the focus of speech research has shifted from read speech to the speech data found in the real world-like broadcast news over radio and TV. During the three years of my Ph.D. study, I have investigated the key technologies of building broadcasting recognition system. The main research work focused on the following three aspects: I proposed a novel method for acoustic change point detection, which is important for the improvement of performance of broadcasting segmentation system. The method proposed here detects the acoustic change points by checking the changing trend of dividing entropy of every signal points in a sliding window. Compared with the traditional detection method based on Bayesian Information Criterion (BIC), the method can detect the acoustic change point more accurately, especially for that between two short signals. The MLLR adaptation method has been widely used in the speech recognition system. The traditional MLLR adaptation method defines the regression classes based on the assumption that all the output distributions close in original acoustic feature space should be tied and transformed together, which may not be valid in some cases. In order to overcome the drawback of the assumption, I proposed a target-driven MLLR adaptation algorithm with multiple layer structure, in which the regression classes is defined in order to have the maximizing increase of the likelihood of the adaptation data. In comparison with the traditional MLLR adaptation method, the new algorithm gives about 10% relative error reduction and causes less computation load. Continuous speech recognition technology is the most important technology in the broadcasting recognition system. A method based on feature space transform is proposed to model correlations between feature coefficients. In the method, state-specified rotation (SSR) transform generates refined multiple mixture diagonal gaussian models first by rotating the feature vectors in each state to an uncorrelated new feature space. Because the acoustic model generated by SSR method has much computation load during decoding, a tying method using the optimization strategy of semi-tied covariance transform (STC) is proposed to tie the feature-space transform matrix among different states. Experiments on LVCSR test showed that the method can achieve nearly 20% relative error reductions compared to the traditional diagonal gaussian modeling method and cause less computation cost during decoding.
关键词广播电视新闻语音识别 连续音频分割 说话人自适应 特征空间旋转变换 方差建模技术 Broadcasting Speech Recognition Broadcasting Speech Segmentation Adaptation State-specific Rotation Semi-tied Covariance Modelin
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/5778
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
贾磊. 汉语广播电视新闻语音识别[D]. 中国科学院自动化研究所. 中国科学院研究生院,2003.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[贾磊]的文章
百度学术
百度学术中相似的文章
[贾磊]的文章
必应学术
必应学术中相似的文章
[贾磊]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。