In recent years, the application of speech technology has set off a new upsurge in computer-assisted language learning (CALL) systems. However, the majority of experts and scholars mainly focus on the study of pronunciation assessment, leaving out the assessment of two significant factors, tone and intonation, by means of displaying pitch curve. In this paper, the assessment and diagnosis of tone and intonation in Chinese CALL system will be discussed respectively. 1. A two-layer DP based algorithm is proposed for pitch extraction. The hoarse voice has great negative impact on the fundamental frequency extraction by causing large amounts of halving frequencies. Analyses show that halving frequencies are coming from pseudo harmonies in spectrogram that is normally higher than the first harmony. The two-layer DP based algorithm is proposed for pitch extraction, and effectively resolves the problem of extracting fundamental frequency in the case of hoarse voice quality. 2. A multi-feature fusion algorithm is established for tone assessment. In order to extract accurate initial/final boundaries of continuous speech, three strategies are adopted - building the noise model, using a small language model and doing force-alignment. As accuracy of reading has strong correlation with the score of tone assessment in research, a string matching technique is used to refuse invalid speech segments, and the accuracy of reading is used to weight the tonal score produced by computer. As different tones have individual distribution characteristics, skewness is also used to improve the performance of tone assessment. Meanwhile, rationality of pause position is helpful for improving the performance of tone assessment. The above techniques together form a set of effective performance scheme of tone assessment. 3. A cluster-based tone diagnosis method is exploited for tone diagnosis. Traditionally, experts mainly adopt four-tone recognition rather than tone diagnosis, and problem of which are that no phenomenon beyond the four tones can be diagnosed by the four- tone model. Tone recognition accuracy under strong accents is relatively low, and the consistency of labeling the same corpus by different people is very low as well. Therefore, the cluster-based method is used to solve these problems, with the same manual tagging level on diagnosing mono-syllable tones and a much higher level on diagnosing di-syllable tones. 4. A special kind of feature called Sorted Error Vector ...
修改评论