Speaker recognition with large-scale population brings many urgent problems, including channel robustness, language independency, and efficient recognition speech and so on. In order to improve the performance of speaker recognition system with large-scale population, this dissertation focuses on the algorithm of factor analysis, score normalization, linear score method and fast speaker search algorithm. 1. In the speaker recognition system with large-scale population, the mismatching between training utterance and testing utterance will lead to dramatic decline in performance. We make some investigations on the factor analysis algorithm and the joint factor analysis algorithm on Gaussian mixture model based speaker recognition, and proposed some equivalent strategy to make the system more stable. Then we induced the residual factor analysis, which can improve the system performance. 2. Although the Gaussian mixture model based speaker recognition is a text-independent system, the language still takes effects and these effects seem to be very serious in the recent NIST speaker recognition evaluation. To compensate the language effect, this dissertation proposed two compensation algorithms. The one regards the language effect as a kind of channel mismatch between the training utterance and the test utterance. Therefore, we can add some bi-lingual utterances to the training corpus, which we used to train the channel subspace to remove language at the model lever. The other algorithm compensates the language effect in the score phase using the language-based normalization, and then we discuss the semi-supervised and unsupervised language normalization. 3. Although the Gaussian mixture model based speaker recognition system has been the state-of-the-art speaker recognition system, the heavy burden of calculating the log-likelihood ratio (LLR) score seems to be a new bottle-net of the system with the large-scale population. This dissertation gives an approximation of the log-likelihood ratio, which leads to significant speedup without any loss in performance. Then we define a new speaker metric space, and introduce the distance and angel of models in the speaker metric space, which can be used as the test algorithm in the text-independent speaker verification system. 4. To improve the speed of the Gaussian mixture model based speaker recognition system with large-scale population, we take use of the speaker metric space, and then induce the high dimensional ...
修改评论