基于内容的音频分类与检索技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于内容的音频分类与检索技术研究
其他题名	content based audio classification and retrieval technology research
	梁伟
	2006-06-08
学位类型	工学博士
中文摘要	随着计算机性能的不断提高，Internet带宽的不断增长，多媒体信息压缩技术和视频/音频流技术的不断发展,人们可以获得的媒体信息越来越多，如何从海量的多媒体信息中快速检索出所需的信息已经变得非常必要。但是目前的检索方式主要是依靠文本或者数值信息检索，这些方式已经不能满足多媒体检索的应用需求。本文根据实际需求，在深入分析音频检索系统的特点以及面临的主要问题的基础上，针对音频流的分割、音频的分类、音频的快速检索以及哼唱检索等问题做了深入的探讨和研究，进行了以下几个方面的研究：（一）首先在连续音频流分割方面，本文提出了一种基于背景声音的音频流分割算法。该算法的优点在于：其一，分割点的检测是基于背景声音，从而避免了由于音频内容的微小变化而导致的大量的错误分割。其二，采用直方图判别准则，分割速度更快。在基于单一场景的音频片段分类方面，本文提出了一个基于改进高斯模型的层次化的分类算法。该算法的优点在于摒弃了传统算法中各维特征具有相同权重的弊端，通过对各维特征的区分能力进行统计，使得各维特征的权重能够进行自适应调整。（二）对音频快速搜索技术进行了深入的研究，设计并实现了基于子带能量的音频快速搜索算法。该算法采用直方图建模方法对目标音频建立模板，并且采用临界带划分策略，提取多子带能量比率作为基本特征。针对音频中的不稳定成分，采用低通滤波器对其进行平滑处理。实验证明该算法对于广播电视级的敏感音频的监控具有实际的应用价值。（三）针对复杂环境下的音频快速检索的鲁棒性问题，提出了基于主频组件的音频搜索方法。该方法提出了一种新的音频畸变消除技术，利用该技术可以大大降低音频中的噪声、扭曲等畸变，提高了系统的鲁棒性。此外，为了保证检索速度，提出了基于目标音频疑似位置预估的二次搜索策略。（四）对音频检索的研究难点和热点——基于哼唱的音乐检索技术进行了研究，并对音乐主旋律的表示和提取进行详细的论述。对于复调音乐的主旋律提取，本算法采用了基于频谱自相关的音调轮廓提取算法。最后利用具有匹配路径约束的DTW匹配方法对主旋律进行了相似度计算。实验结果证明该算法在哼唱检索系统中取得了一定的检索效果。
英文摘要	With the development of computer, Internet, Multimedia Information Technology, Multimedia Information Retrieval becomes an impending requirement of the web application. Focused on audio/video (AV) stream information processing, this dissertation presents the researches on content-based audio classification and retrieval. The work of the dissertation mainly includes following contributions:  To cope with the issue of AV stream segmentation and classification, firstly, a background-sound based audio stream segmentation algorithm is proposed to partially avoid the detection errors that are caused by the complexity of audio content. Because of the low computation load of histogram intersection method, the segmentation algorithm is faster and has fewer errors than traditional algorithm. Then, a hierarchical classification algorithm based on modified Gaussian model (MGM) is employed for audio classification. Compared to traditional Gaussian model, MGM improved the drawback that all features are of the same weight in GM.  A histogram audio search algorithm based on multiple sub-band energy features is presented for detecting and locating object audio clip in continuous AV stream. The algorithm has been truly applied in TV/radio program monitoring system at present.  An improved method of audio search based on multiple critical-bands modules is introduced to enhance the robust of the audio search algorithm under complex environments. The method improves the performance by using audio distortion eliminating algorithm and 2nd search algorithm.  A music retrieval method of query by humming (QBH) based on pitch envelope is studied. The pitch envelope extraction is based on spectrum autocorrelation. The similarity was computed by DTW, whose search path is constrained. Experimental results have shown that the algorithm may contribute to the robustness of QBH in some sense.
关键词	音频流分割音频分类音频快速搜索哼唱检索多媒体检索 Audio Segmentation Audio Classification Audio Fast Search Multimedia Retrieval Query By Humming
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/5944
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	梁伟. 基于内容的音频分类与检索技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2006.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20031801460301（719KB）			暂不开放	CC BY-NC-SA