音频内容检索技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	音频内容检索技术研究
其他题名	Research of Audio Content Retrieval
	高鹏
	2008-05-31
学位类型	工学博士
中文摘要	本文主要研究了语音检索相关的核心算法，从词汇无关的快速关键词检测入手，分别研究了基于GMM和TRAP-NN框架下语音搜索的基本方法，以及基于词图的语音搜索算法，并提出了相应的索引和检索方法。在此基础上，实现了一个实用的语音检索系统。论文工作的主要内容和贡献如下： 1 改进了基于声学GMM模型的关键词检测算法，提出一种新的基于音素矩阵的词汇无关快速关键词检测方法，在损失较少的检测准确率前提下，大幅度提高检测速度； 2 研究了基于TRAP特征和NN声学模型的音素识别器，在此基础上提出了基于TRAP-NN框架的快速关键词搜索算法，与GMM快速检测方法的准确率相同，声学训练语料是GMM的1/5，索引生成速度是GMM方法的3倍； 3 研究了基于词图的关键词搜索算法，以及混淆网络和改进的词图转音节图的搜索方法，部分解决了汉语词图的集外词问题，搜索准确率比纯声学方法有明显提高，并提出了一种双音节索引方法对拼音图进行索引，在存储消耗上满足了语音检索的要求； 4 基于以上提出的语音检索核心算法，设计并实现了一个完整的语音检索系统，解决了海量语音处理、海量索引存储管理、检索接口等系统问题，达到了实用要求。
英文摘要	In this paper, the fundamental methods of audio content retrieval are researched. Started from fast vocabulary independent keyword spotting, several audio content search algorithms based on acoustic GMM and TRAP-NN framework are explored. Word lattice-based methods are also discussed, and related indexing and retrieval schemes are provided. Based on these techniques, an integrated audio content retrieval system is implemented. The main works of this paper are as follows: 1 Based on acoustic GMM method, a novel phoneme matrix-based vocabulary independent keyword spotting approach is proposed. Very fast detect speed is achieved at the cost of acceptable precision loss. 2 TRAP feature and NN model are explored, and related phoneme recognition results are discussed. Keyword search method based on TRAP-NN model is provided. Compared to GMM approach, the new method require less training corpus with faster indexing speed and same detection rate. 3 Lattice-based keyword search methods are researched. Confusion network and revised word-to-syllable lattice algorithms are both proposed to solve OOV problem, with better precision than previous methods. A novel bi-syllable indexing and search scheme is proposed with less storage consumption cost and no performance degradation. 4 Using techniques above, an integrated audio content retrieval system is designed and implemented. Problems about mass audio processing, index storage and retrieval interfaces are carefully considered and solved.
关键词	语音识别信息检索关键词检测海量语音处理 Speech Recognition Information Retrieval Keyword Spotting Mass Audio Data Processing
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6105
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	高鹏. 音频内容检索技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2008.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20041801462808（1151KB）			暂不开放	CC BY-NC-SA