The main works of this paper are as follows. 1、 Developed Chinese Speech Recognition (CSR) system with phoneme as the base model, based on detailed study of our CSR technologies. Importance of Chinese tones information was showed from two aspects of speech recognition, such as feature and model. 2、 Characteristics of English were intensive studied first, and then our English Speech Recognition (ESR) system was developed, including initial model training and design of question set and decision tree based triphone model training and search process of the recognizer. Then semi-tied covariance modeling techniques are improved using more robust Bayes information as the criterior of deciding the number of covariance transformation matrix. The compensation in the log-spectral domain is also investigated to gain more robust acoustic model. At last, nonnative speaker adaptation was tested by data driven maximum likelihood linear regression (MLLR) fast adaptation algorithm. 3、 Japanese Speech Recognition (JSR) system was developed rapidly with fast bootstrapping method of MSR. Then end-point detection algorithm based on statistics is suggested. This algorithm is more robust for noisy speech than others. At last, simple tests of cross-language speech recognition from Chinese and English and Chinese-English bilingual system to Japanese were carried out. The results showed that bilingual acoustic model performed better than language-dependent models. 4、 Several Chinese-English bilingual acoustic modeling techniques were explored intensively, such as direct combination of two sets of base model and IPA mapping and automatic agglomerative clustering by different distance measures, e.g., Bhattacharyya distance, log-likelihood and maximum mutual information (MMI). Language related questions were adopted in the decision tree training process and achieved higher performance than the traditional method.
修改评论