In the past ten years, great progress has been made in the state-of-the-art laboratory speech-recognition system. Recently the focus of speech research has shifted from read speech to the speech data found in the real world-like broadcast news over radio and TV. During the three years of my Ph.D. study, I have investigated the key technologies of building broadcasting recognition system. The main research work focused on the following three aspects: I proposed a novel method for acoustic change point detection, which is important for the improvement of performance of broadcasting segmentation system. The method proposed here detects the acoustic change points by checking the changing trend of dividing entropy of every signal points in a sliding window. Compared with the traditional detection method based on Bayesian Information Criterion (BIC), the method can detect the acoustic change point more accurately, especially for that between two short signals. The MLLR adaptation method has been widely used in the speech recognition system. The traditional MLLR adaptation method defines the regression classes based on the assumption that all the output distributions close in original acoustic feature space should be tied and transformed together, which may not be valid in some cases. In order to overcome the drawback of the assumption, I proposed a target-driven MLLR adaptation algorithm with multiple layer structure, in which the regression classes is defined in order to have the maximizing increase of the likelihood of the adaptation data. In comparison with the traditional MLLR adaptation method, the new algorithm gives about 10% relative error reduction and causes less computation load. Continuous speech recognition technology is the most important technology in the broadcasting recognition system. A method based on feature space transform is proposed to model correlations between feature coefficients. In the method, state-specified rotation (SSR) transform generates refined multiple mixture diagonal gaussian models first by rotating the feature vectors in each state to an uncorrelated new feature space. Because the acoustic model generated by SSR method has much computation load during decoding, a tying method using the optimization strategy of semi-tied covariance transform (STC) is proposed to tie the feature-space transform matrix among different states. Experiments on LVCSR test showed that the method can achieve nearly 20% relative error reductions compared to the traditional diagonal gaussian modeling method and cause less computation cost during decoding.
修改评论