Speaker recognition under telephony environment brings some high necessities, including channel-robustness, speaker variability and decision-making. During the past years, some novel techniques and algorithms have been proposed for speaker recognition based on statistical or discriminative frameworks, which is significant for practical applications. In order to improve the performance of speaker recognition system over telephone, this dissertation focuses on the research on the usability of Gaussian component information, feature and score normalization, quality measure-based score computation, speaker segmentation and multi-speaker recognition.1. We make some investigations on Gaussian mixture model based speaker recognition. Based on the error analysis of the error-prone and confusion frames, a frame-level nonlinear score normalization is proposed in speaker identification task. The likelihood difference between the adjacent frames is restrained. At the same time, the score difference of speakers against the same speech frame is enlarged. In GMM-UBM based speaker verification, an experimental study of exploiting Gaussian component information is proposed to use the detailed component-specific information in generative likelihood ratio estimation. 2. In order to solve the problem of significant deterioration due to the mismatches between the training and testing acoustic conditions, two compensation approaches based on feature normalization and score normalization are presented, respectively. Firstly, segment-based cepstrum mean and variance normalization is modified to normalize the cepstral coefficients with similar segmental Gaussian distribution to improve the matching degree in different environmental conditions. Secondly, in order to cope with the score variability among the speakers and test utterances, two-stage score normalization techniques are presented to transform the output scores and make the speaker-independent decision threshold more robust under adverse conditions. Finally, we study the score normalization method in the application to speaker identification based on MFCC and prosodic features. This method can achieve better identification accuracy.3. A quality measure algorithm using Gaussian mixture density for traditional GMM-UBM scoring mechanism has been presented in this dissertation. By the use of GMM-based quality models, the proposed method explores the issues involved in applying soft estimates to quality measures as weighting factors in score computation. It has the advantage of estimating quality to potentially utilize broad phonetic-specific speaker characteristics by GMM modeling. Incoporation of Jensen divergence measure for quality estimation and clustering-based vector pre-quantization are performed to reduce the redundancy in speech signal and the computational load. Comparison experiments show the effectiveness of the proposed method.
修改评论