The most challenging part in speaker recognition of telephone conversations is the intra-session variability in the summed channel. We mainly focus on the robust speaker diarization and recognition for two speaker scenarios in this thesis and the contribution is shown as follows: 1. We compare several confidence measures in the framework of joint factor analysis and obtain symmetric scoring method based on the first order approximation of Taylor series for fully likelihood calculation, which com-pletely symmetrizes the problem so that it does not matter anymore which utterance in a trial is for enrollment and which is for test. 2. Based on the symmetric scoring we investigate various normalization meth-ods and extend the implicit normalization formula to any confidence mea-sures defined in the form of inner product. According to the general form of symmetric normalization we also modify the KL kernel to incorporate some kinds normalization in the kernel space. 3. Because of the dominance of GMMs in speaker related fields and the bottle-neck of sufficient statistics extraction especially when the number of com-ponents grows to thousands, we propose a data driven Gaussian componen-t selection algorithm based on multi-layer acoustic space partition which achieves a 10 times faster Baum-Welch statistic extraction without any performance loss. 4. Applying the variational Bayesian in the context of iVector representation for fuzzy clustering in speaker diarization which is proved to be more effec-tive than the traditional hierarchical agglomerative clustering. We decrease the diarization error rate from 13.8% to 6.88% and further improve it to 5.34% after Viterbi re-segmentation. 5. Finaly, we introduce the PLDA model into the target speaker selection for multiple summed-channel excerpts enrollment. We also propose and evalu-ate several kinds of objective function to measure the purity of selected seg-ments, which obtains a much better equal error rate(4.05%) than the best system of NIST-SRE 2008 on the 3summed-summed test condition(∼8%).
修改评论