Spoken language generation has made great progress during the last few years. The intelligibility of the generated speech has been improved greatly. However, the naturalness and expressiveness (or variability) is still far from expectation. The limitations rest mainly with: · limitation of speech synthesis methods themselves; · lack of well annotated large-scale expressive speech corpora; · lack of research on the collective effects of high-level affective factors on the expressiveness of speech: most work has been restricted to some too narrow domains (e.g. only considering emotions); · lack of connection with natural language processing. Research described in the thesis focuses on natural language generation, but the ultimate purpose is to make a contribution to the elimination of the latter two limitations in speech synthesis, and to improve the expressiveness of the generated text and speech. This thesis presents our work on natural utterance generation, integrating pragmatic information and the control of prosody. The main work can be summarized as follows: (1) After segmentation, alignment, and removing of noisy speech, and then the adding of annotation, a speech corpus is built, including over 2500 segmented spontaneous sentences (speech and text, extracted from the TV series “Photo Album USA”). The annotation of the corpus involves information of pragmatics, semantics and speech attributes. Qualitive (or quantitative) analysis and summarization are made, based on the literature and the study of the corpus, on the expression of pragmatic (or semantic) information in speech, as a profitable try toward the establishment of connections among pragmatic information, semantic information, and speech attributes. (2) A hybrid method, integrating rule-based planning method and heuristic algorithm, is used in micro-planning phase. The import of heuristic algorithm has effectively restricted the size of the micro-planning rule base. Meanwhile, compared with the usage of rule-based planning method itself, the hybrid planning algorithm has improved the coverage of the input, even with a size-reduction of the planning rule base, which has notably improved the recall rate and robustness of the system. (3) It is difficult to directly control the generation procedure with pragmatic information. To bridge the high-level pragmatic information and NLG decisions, a rhetorical-goal level has been introduced between the pragmatic level and the semantic /syntactic...
修改评论