计算听觉场景分析中层表达与组织策略研究

CASIA OpenIR > 毕业生 > 博士学位论文

	计算听觉场景分析中层表达与组织策略研究
其他题名	Mid-level Representations and Grouping Strategies for Computational Auditory Scene Analysis
	张学良
	2010-06-01
学位类型	工学博士
中文摘要	本文主要研究基于计算听觉场景分析（CASA）的单声道噪声语音分离问题，对CASA 的中层表达及组织策略进行了深入的探索和研究。主要的工作和创新点如下： 1）研究了时频单元对语音谐波响应的不同特性，利用载波和包络之间的关系将其分为确定性和非确定性时频单元两类。在组织过程中，利用了听觉场景分析中的谐波原理和最小幅度原理组织确定单元。对非确定时频单元，使用改进包络自相关度量幅度调制率进行组织。通过分类准则以及组织策略的改进，在英国谢菲尔德大学Cooke 提供的100 句测试集上的结果显示，系统的分离性能明显提高，平均SNR 相比以往算法提高0.96 dB。 2）研究了听觉自相关谱峰值与谐波的关系，提出了动态谐波函数进行基音提取和声源分离。相比听觉自相关谱，动态谐波函数可以调节分辨率方便基音提取。在组织过程中，各个峰值的高度根据频谱变化动态调节，抑制在非基音周期位置处的峰值。实验结果表明，算法有效提高了噪声语音分离算法的性能。在Cooke 测试集上，平均SNR 相比以往算法提高1.48 dB。 3）研究了多基音跟踪算法，使用混合Laplacian 分布对基音周期位置建模。通过研究频率之间的谐波关系，利用启发式迭代算法将混合Laplacian 分布的峰值收敛到基音周期位置。通过跟踪算法形成连续的多基音包络。实验表明，算法对基音跟踪的准确性更高。 4）研究了基于计算听觉场景分析的噪声语音分离加速模型。听觉自相关谱对于分离系统十分重要，但是计算复杂度高，导致算法运行效率低。通过直接对听觉滤波器输出进行分析，提出了基于基音和梳状滤波器的组织策略。模型避免计算听觉自相关谱，使得算法的运行效率大大提高，节省计算时间约83%。同时实验表明，新的组织策略对噪声语音的分离效果有进一步提升，平均SNR 相比以往算法提高0.8 dB
英文摘要	In this dissertation, we studied the monaural noisy speech segregation based on computational auditory scene analysis (CASA) by exploring the new midrepresentation and group strategies of CASA. The main works and innovations include: 1） According to the different characteristics of responses to harmonics, we classified time frequency (T-F) units into resolved and unresolved one by using carrier to envelope energy ratio.For resolved T-F units grouping, we employed “harmonicity”principle and“minimum amplitude” principle. The amplitude modulation, measured by “enhanced” autocorrelation function, is employed for unresolved T-F units grouping. By improving the classification rule of T-F units and grouping strategies, the proposed system has substantially better performance. Compared with previous method, the average SNR of proposed system is improved 0.96 dB on Cooke’s dataset. 2） By studying the relationship between peaks of correlogram and harmonics, we proposed dynamic harmonic function for pitch determination and source segregation. Compared with correlogram, dynamic harmonic function had adjustable resolution which can reduce the mutual overlap between the multiple voices and facilitate the pitch determination. The peaks’height of dynamic harmonic function changed according to the different spectrums. It tended to suppress the peaks at non-pitch period which reduced the segregation error. The experiment result showed that the algorithm enhanced performance improved SNR about 1.48 dB. 3） We also proposed the multi-pitch tracking by mixture Laplacian distribution (MLD). By analyzing the harmonic relationship between frequency components, we employed a heuristic method to make the peak of MLD converge to the pitch period. After that, continuous pitch tracks were formed by post processing. Experiment results showed that the multi-pitch tracking algorithm obtained better estimation for pitches. 4） The correlogram is important mid-level representation for CASA which models the “harmonicity” principle in a simple way and also provide useful information for segmentation. However, its computation is time consuming. By processing directly on filter responses, an efficient scheme for sound source separation system based on pitch and comb filter was proposed. The model preserved the framework of CASA-based segregation methods and ran efficiently omitting correlogram. The experiment results showed that the proposed model saved the computing time a...
关键词	语音分离计算听觉场景分析多基音跟踪动态谐波函数 Speech Separation Computational Auditory Scene Analysis Multi-pitch Tracking Dynamic Harmonic Function
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6283
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张学良. 计算听觉场景分析中层表达与组织策略研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20061801462803（2943KB）			限制开放	CC BY-NC-SA