融合语用、句法和韵律信息的自然语言生成建模研究及原型系统

CASIA OpenIR > 毕业生 > 博士学位论文

	融合语用、句法和韵律信息的自然语言生成建模研究及原型系统
其他题名	Modelling and prototyping of a generation method integrating pragmatic, syntactic, and prosodic information
	曹文洁
	2009-09-25
学位类型	工学博士
中文摘要	近年来，口语生成研究取得了很大的进展，合成语音的可理解性也有了极大的改善。然而合成语音的表现力（或者称为可变性）还远远不能达到自然度的要求，其局限性主要表现在以下几点： · 语音合成方法本身的局限性； · 缺少大规模标注情感口语语料； · 缺少结合各种影响语音信号表现力（情感等）高层因素的综合研究，大部分研究工作还局限在较窄的领域中进行（比如仅仅针对情感进行研究）； · 与自然语言处理研究的结合不够紧密。论文的研究工作围绕自然语言生成展开，最终目标是为了改善上述后三点局限性，从而提高生成语言及语音的表现力。本论文重点研究融合语用信息的英文生成方法。主要工作归纳如下：（1）抽取美国电视情景系列剧“Photo Album USA”（走遍美国）的一部分语音，按语句对其进行了切分、对齐，噪声语音去除，并添加了丰富的标注信息，从而建立起了一个两千五百多句的、含有丰富的语用、语义、语音等信息的口语语料库。详细分析了该口语语料库，并结合相关文献，对不同语用或语义信息在语音层的体现进行了定性或定量分析与总结，对于建立起语用、语义、和声音属性之间的联系进行了有益的尝试。（2）提出了规则和启发式规划方法相结合的句子生成中的微观规划策略，不仅有效控制了微观规划规则库的规模，改善了系统的可维护性，而且与单纯的基于规划规则的方法相比，该策略在减小了规则库规模的同时，提高了系统对输入的鲁棒性，提高了系统的召回率和鲁棒性。（3）考虑到语用信息很难直接作用于生成过程，本论文提出了在语用层和语义、句法、韵律层之间引入修辞目标层，作为深层语用信息和自然语言生成决策之间的桥梁的思想，并建立了修辞控制机制，利用系统输入的语用信息获得修辞目标的取值，然后利用修辞目标计算系统的决策变量，进而控制生成系统的生成决策（句子规划、词汇选择和表层生成），从而在一定程度上满足了富于表现力的生成要求。（4）与基于结构的语法相比，系统功能语法基于特征，以功能为驱动，并很好地考虑了语言的社会性。这些特点更加符合融合语用信息进行生成的要求，并且便于融入新特征，因此我们选择系统功能语法作为生成系统的语法。本文建立了一套较完整的用于英语句子生成的系统功能语法，并对系统功能网络的语气部分进行了扩展，从而在可操作的层面上建立起了联结语用信息和语义、语音模式的纽带。如果说修辞控制机制提供了利用语用信息控制生成过程的途径，那么语法中语气部分的扩展则是在内容上进行了语用信息和语义、语音模式之间的联系。（5）在上述工作的基础上，建立了一个旅游信息咨询与旅馆预订领域的英语语句生成原型系统。原型系统在生成策略、微观规划策略、表层生成和修辞控制几个方面实现了我们的上述思路。实验表明，我们的原型系统在生成性能和生成语句的表现力方面都取得了较好的效果。
英文摘要	Spoken language generation has made great progress during the last few years. The intelligibility of the generated speech has been improved greatly. However, the naturalness and expressiveness (or variability) is still far from expectation. The limitations rest mainly with: · limitation of speech synthesis methods themselves; · lack of well annotated large-scale expressive speech corpora; · lack of research on the collective effects of high-level affective factors on the expressiveness of speech: most work has been restricted to some too narrow domains (e.g. only considering emotions); · lack of connection with natural language processing. Research described in the thesis focuses on natural language generation, but the ultimate purpose is to make a contribution to the elimination of the latter two limitations in speech synthesis, and to improve the expressiveness of the generated text and speech. This thesis presents our work on natural utterance generation, integrating pragmatic information and the control of prosody. The main work can be summarized as follows: (1) After segmentation, alignment, and removing of noisy speech, and then the adding of annotation, a speech corpus is built, including over 2500 segmented spontaneous sentences (speech and text, extracted from the TV series “Photo Album USA”). The annotation of the corpus involves information of pragmatics, semantics and speech attributes. Qualitive (or quantitative) analysis and summarization are made, based on the literature and the study of the corpus, on the expression of pragmatic (or semantic) information in speech, as a profitable try toward the establishment of connections among pragmatic information, semantic information, and speech attributes. (2) A hybrid method, integrating rule-based planning method and heuristic algorithm, is used in micro-planning phase. The import of heuristic algorithm has effectively restricted the size of the micro-planning rule base. Meanwhile, compared with the usage of rule-based planning method itself, the hybrid planning algorithm has improved the coverage of the input, even with a size-reduction of the planning rule base, which has notably improved the recall rate and robustness of the system. (3) It is difficult to directly control the generation procedure with pragmatic information. To bridge the high-level pragmatic information and NLG decisions, a rhetorical-goal level has been introduced between the pragmatic level and the semantic /syntactic...
关键词	自然语言生成语义表示情感语音合成语用情感 Natural Language Generation Semantic Representation Affective Speech Synthesis Pragmatics Affect Emotion
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6227
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	曹文洁. 融合语用、句法和韵律信息的自然语言生成建模研究及原型系统[D]. 中国科学院自动化研究所. 中国科学院研究生院,2009.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20031801460299（1193KB）			暂不开放	CC BY-NC-SA