CASIA OpenIR  > 毕业生  > 博士学位论文
知识引导的段模型解码及其关键词检测研究
其他题名Knowledge Guided Segment Model Decoding and Keyword Spotting System
张华
学位类型工学博士
导师刘文举
2008-05-24
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词语音识别 随机段模型 关键词检测 快速解码算法 Speech Recognition Segment Model Keyword Spotting Fast Decoding Algorithm
摘要声学模型,是语音识别领域的核心研究方向之一。段模型放宽了隐马尔科夫模型(Hidden Markov Model, HMM)模型在给定状态时语音观测矢量相互独立的假设,获得更精确的声学模型。然而,基于随机段模型的大词汇量连续语音识别系统,虽然有更优于HMM系统的识别性能,却难以达到较快的识别速度,成为制约其应用的关键问题。本文针对段模型解码算法及其语音识别系统和关键词检测系统,进行的主要工作有: (1)实现了基于声韵母预分类的段模型提速算法。该方法是语音知识在解码中的初步应用。首先,对语音信号所属声韵母类别作出快速判断,实现了语音段上相应声学模型解码空间的压缩,从而有效地提高了解码速度。在不影响识别正确率的同时,识别系统的识别时间减少了51.8%。 (2)完成了语音中声带不振动界标点的检测和定位,并将其作为搜索过程启发点,引入段模型解码算法中。声带不振动界标点检测算法,能够检测出语音信号中声带停止或开始自由振动的时间点。实验表明,汉语中87.4%的塞音及98.9%的擦音能够通过声带不振动语音段的检测工作从语音中分离出来。 (3)在对语音中发音稳定段的分析和检测的基础上,实现了可变步长的段模型解码方法。发音稳定段,是指与发音动作平稳时段对应的语音信号段。在声学模型解码过程中,跳过起点或终点落在发音稳定段中的待解码语音段,从而提高了解码速度。在识别系统中,识别时间缩短了23.4%,同时声韵母识别错误率相对降低了2.3%;在关键词检测系统中,声韵母网络的构建时间缩短了32.5%,同时关键词漏检率相对降低了15.4%,虚警率相对下降了15.7%。 (4)实现了采用语音预分类及特定区域校验的关键词检测系统。首先,关键词预检测方法快选出语音中可能包含关键词的语音段;随后段模型在这些语音段上计算其对应关键词的置信度得分,得到关键词检测结果。
其他摘要Acoustic decoding is one of key problems in automatic speech recognition research. Segment models (SM) adopt segmental distribution rather than frame-based features in HMMs to represent the underlying trajectory of the observation sequence, so the SM-based speech recognition system can release some limitations in HMM system, and obtain more perfect accuracy but with higher complexity and computation. To speed up SM's decoding is a crucial work for its applications. During my Ph.D. study, I have investigated the key technologies of fast decoding algorithm, and the application of these algorithms on segment model based keyword spotting system. The main research work focused on the following four aspects: (1)Propose the phone-classification based fast decoding algorithm. First, the information of phone classes for each frame of acoustic signal is obtained by GMM classifier. Thus, the acoustic model decoding space is reduced according to class results. The run time of SM based speech recognition system is reduced by 51.8% without impact on its error rate. (2)Introduce unvoiced landmark into decoding of SM based recognizer as search beginning indicator. The unvoiced landmark detection algorithm locates the points in speech where the vocal folds stop or begin freely vibrating. In experiments, 87.47% of stops and 98.94% of fricatives were segmented from speech after the unvoiced landmark detection. (3)Develop an adaptive step decoding method using steady-energy pieces (SEP). SEP is defined as speech pieces without remarkable change in spectrum. During acoustic model decoding, frame-by-frame decoding of segments which start or end in SEP is overleaped. The SM-based recognizer reduced 23.4% of its runtime, with 2.3% relative reduction on phone error rate. The SM-based KWS reduced 32.5% of its runtime, with 15.4% relative reduction on missing rate, and 15.7% relative reduction on false alarm. (4)Propose a novel keyword spotting system using phone classification and keyword verification on focus region. First, region focusing algorithm detects the focus regions in speech, which have higher probability to contain keywords than other region in speech. Then, the confidence of the keyword on focus region is measured by segment model.
馆藏号XWLW1205
其他标识符200518014628062
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6071
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
张华. 知识引导的段模型解码及其关键词检测研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2008.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20051801462806(1499KB) 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[张华]的文章
百度学术
百度学术中相似的文章
[张华]的文章
必应学术
必应学术中相似的文章
[张华]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。