In this dissertation, we studied the monaural noisy speech segregation based on computational auditory scene analysis (CASA) by exploring the new midrepresentation and group strategies of CASA. The main works and innovations include: 1) According to the different characteristics of responses to harmonics, we classified time frequency (T-F) units into resolved and unresolved one by using carrier to envelope energy ratio.For resolved T-F units grouping, we employed “harmonicity”principle and“minimum amplitude” principle. The amplitude modulation, measured by “enhanced” autocorrelation function, is employed for unresolved T-F units grouping. By improving the classification rule of T-F units and grouping strategies, the proposed system has substantially better performance. Compared with previous method, the average SNR of proposed system is improved 0.96 dB on Cooke’s dataset. 2) By studying the relationship between peaks of correlogram and harmonics, we proposed dynamic harmonic function for pitch determination and source segregation. Compared with correlogram, dynamic harmonic function had adjustable resolution which can reduce the mutual overlap between the multiple voices and facilitate the pitch determination. The peaks’height of dynamic harmonic function changed according to the different spectrums. It tended to suppress the peaks at non-pitch period which reduced the segregation error. The experiment result showed that the algorithm enhanced performance improved SNR about 1.48 dB. 3) We also proposed the multi-pitch tracking by mixture Laplacian distribution (MLD). By analyzing the harmonic relationship between frequency components, we employed a heuristic method to make the peak of MLD converge to the pitch period. After that, continuous pitch tracks were formed by post processing. Experiment results showed that the multi-pitch tracking algorithm obtained better estimation for pitches. 4) The correlogram is important mid-level representation for CASA which models the “harmonicity” principle in a simple way and also provide useful information for segmentation. However, its computation is time consuming. By processing directly on filter responses, an efficient scheme for sound source separation system based on pitch and comb filter was proposed. The model preserved the framework of CASA-based segregation methods and ran efficiently omitting correlogram. The experiment results showed that the proposed model saved the computing time a...
修改评论