Along with the great progress made in the state of the art of speech recognition technology, the speech recognition system is deployed in commercial application recently. One of the most important problems is to compensate for the negative effects of noise in the performance of ASR under noise environments. Based on the summarization and analyses on kinds of algorithms for noise robust ASR, and according to the influence of noise on speech signal, feature and model spaces, I tried to investigate the relative research on endpoint detection, speech enhancement, speech feature enhancement, the combination of model compensation and feature compensation. Endpoint detection is important for ASR. Based on the deep investigation on speech spectral entropy, I proposed to alter the spectral probability density function of entropy by the introduction of a positive constant, and established the endpoint detection strategy. The obtained spectral entropy improves the discriminability between speech and noise greatly. Moreover, the enhanced spectral entropy is not almost subject to the change of signal to noise ratio(SNR) due to the introduced different constant at different SNR. Then the thresholds are very easy to set and tune. Its superiority over basic spectral entropy and conventional energy-based approach was evidenced by many experiments. The speech perceived quality and speech intelligibility can be improved by speech enhancement algorithms. Based on the analysis on the deficiency of the maximum a posteriori(MAP) estimator based on AR-HMM's at low SNR, I pro- posed to incorporate codebook constrained Wiener filter into MAP framework to impose spectral constraints on the harmonic of Wiener filters so as to satisfy some vocal track constraints described by code-book. The objective measures, output SNR and Itakura-Saito distortion measure, verified the quality improvement of the proposed method. As a preprocessor for ASR, it can also improve the recognition accuracy rate. The speech feature enhancement and compensation is to clean speech features to make them matched with training environment. The environment mismatch due to additive noise is assumed as an additive bias in power spectral domain. Due to the internal relationship between bias and noise power spectra, it is valuable to introduce the noise priori knowledge into bias estimation process by using MAP criterion. Moreover, the mismatch is usually nonstationary in real application, so it is necessary to track time varying additive bias and update the statistical characteristic adaptively. Thus, I proposed to use the sequential techniques based on Kullback-Leibler information measure to estimate additive bias and update the parameters of noise priori in MAP framework sequentially. Speech recognition experiments demonstrated that the proposed algorithm outperformed sequential maximum likelihood estimation method and was obviously better than the ba
修改评论