With the global economic community expands, multilingual language identification (LID) plays an increasingly important role in speech information services for non-native speakers. LID can also serve as a front-end for a multilingual speech recognition system and a multilingual speech translation system. Orienting to real-time monitoring application of multilingual broadcast programs, this paper presents the recent progress obtained in the effort to research on multilingual LID technology including audio classification, language recognition confidence measure. Firstly, we make an investigation on separating nonspeech signals from real audio stream to improve LID performance. A novel SVM-based method for broadcast audio signals classification is proposed. By this method, the audio stream is first classified into silence and non-silence segments using an energy threshold. Then SVM classifiers are employed to classify those non-silence segments into four audio types, namely pure speech, non-pure speech, environment sound and music. Compared with the traditional methods such as GMM and KNN, our experimental results show that this method exhibits better classification performance and more robustness. Secondly, we study GMM-UBM method in application to LID. In our work, we extract shifted delta cepstrum (SDC) coefficients instead of MFCC for feature extraction and use Gaussian backend (GBE) classifiers to replace pick max for score decision. Because SDC coefficients comprise of more delta cepstrum and GBE classifiers comprise of LDA module to further distinguish different model scores, this method can achieve better identification accuracy. Thirdly, we introduce support vector machine (SVM) into LID application and examine the problem of discriminative training for SVM in large training data, This approach adopts polynomial expansion technique, minimum Mean Squared Error (MSE) discriminative training algorithm and Generalized Linear Discriminant Sequence (GLDS) kernel to train SVM classifiers for language classification. In addition, we research the recognition confidence for discriminative models and propose a method of recognition confidence measure using sigmoid transformation to SVM model scores. By using GBE classifiers to combine the scores of GMM-UBM method, SVM method with PPRLM method, we achieve superior performance on the OGI-TS corpora and NIST LRE data. Finally, we integrate the techniques of language identification and fast audio retrieval, and design a LID system used to real-time monitor multilingual broadcast programs. This system employs many techniques such as audio classification, signal quality evaluation, audio retrieval, noise reduction, language identification and recognition confidence measure etc. to process mass data from broadcast programs.
修改评论