Prosody is the super-segmental feature of speech, and it can promote, compensate to express semantics and pragmatics, thus it plays an important role in spoken communication, which makes it become the research focus of speech language sciences and technologies. The traditional prosody studies emphasis on rhythm, whereas this paper focuses on Mandarin stress. First, we built a large-scale stress-annotated corpus and carried out a comprehensive analysis and modeling of Mandarin word stress and sentence stress. Second, we constructed several text-based stress prediction models. A novel hierarchical Mandarin stress modeling and prediction approach is proposed. Finally, the stress study was applied to Text-to-Speech as an example to show how stress study promotes speech and language engineering. In detail, the main contributions of this dissertation include: 1. A large-scale stress-annotated corpus was built and a detailed analysis of Mandarin stress was carried out. Six thousand utterances were annotated with word stress and sentence stress. Statistical analysis points out that pitch is the first cue in Mandarin stress perception in continuous speech. For word stress, perceptual difference in the different rhythm levels and tone patterns has obvious regularities. There is no significant stability in disyllabic stress patterns. This dissertation also utilizes pitch, duration and their statistical parameters to detect stress automatically in continuous speech using Multiple Linear Regression Analysis and Decision Trees. 2. The prosody and syntax interface was investigated and a series of syntax-stress mapping rules were summed up. Classification and Regression Tree (CART) and Maximum Entropy (ME) model were also employed to predict word stress and sentence stress which only use textual features. Model optimization was conducted with feature selection under the framework of ME model. Experiments show the optimized model outperforms the baseline with fewer features, which would reduce the training and running time and also the model size. 3. To strengthen the unstressed syllable study in Mandarin, this dissertation proposed a novel hierarchical Mandarin stress modeling and prediction method. The top level emphasizes stressed syllables, while the bottom level focuses on unstressed syllables for the first time because of its importance in both naturalness and expressiveness of synthetic speech. Prediction experiments confirmed the modeling method could capture t...
修改评论