CASIA OpenIR  > 毕业生  > 博士学位论文
对话行为理解与口语翻译方法研究
其他题名Research on Dialog-Act Understanding and Spoken Language Translation
周可艳
学位类型工学博士
导师宗成庆
2010-05-27
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词对话行为 口语翻译 口语语料库 口语现象 Dialog-act Spoken Language Translation Dialog Corpus Ill-formedness
摘要口语对话理解是指用计算机实现对口语对话的解析,是完善人机对话系统、提高口语机器翻译水平等口语处理系统的关键性问题。对话行为作为描述口语对话的语用特征,结合了交际意图与语义信息,属于浅层篇章结构的范畴,目前已被应用于语音识别系统、人机对话系统、自动文摘系统及口语翻译系统。 本文的研究内容包括大规模真实口语对话语料库建设及标注方法研究,对话行为建模和自动识别方法的研究与实现,融合对话行为理解的口语翻译方法研究。 在口语对话语料库的建设方面,本文根据中文口语对话的特点,侧重于口语现象的描述,提出了一套改进的对话行为标注规范,建立了专门的口语现象描述方法。在此基础上,建立了基于真实电话录音的汉语口语对话标注语料库。该语料库除了含有丰富的语音、语义、语用等多层标注信息,还描述了插入、重复、次序颠倒等多种口语现象。不仅可以使研究人员对口语对话理解进行研究,而且还可以针对口语现象进行分析处理。该语料库的研究,对于促进口语系统走向应用有重要的意义。 在对话行为的建模和自动识别方法研究方面,本文提出了基于马尔柯夫决策过程(Markov Decision Process, MDP)的对话行为预测模型。基于该模型的对话行为预测结果融入到基于语句的对话行为识别中,取得了较好的识别效果。在该问题研究中,作者不仅改进了识别模型,而且从特征选取角度出发,提出了基本名词短语、邻接对等有效的新特征和多种特征组合方法,使对话行为自动识别的正确率有了进一步的显著提高。 在改进口语翻译系统性能方面,本文提出了融合对话行为这一语用信息的口语翻译方法。该方法以基于短语的统计机器翻译系统为应用对象,利用对话行为的自动分类,使训练语料-测试语料、开发集-测试集、源语言-目标语言的一致性得到提高,提高了翻译系统的性能,使最终的翻译结果可以更准确地反映源语言所要表达的对话意图。 另外,本文还提出了一种基于语义词典的未登录词处理方法,该方法利用汉语同义词知识对源语言未登录词的语义进行解释,在一定程度上解决了口语翻译中未登录词的翻译问题。
其他摘要Spoken Language Understanding is a technology that focuses on analysing dialogs automatically, which is key technology for improving the performance of spoken language processing system, such as dialog system, spoken language translation system, etc. Dialog-Act, which is a combination of a communicative function and a semantic content, belongs to shallow discourse structure. Dialog-Act has been applied in several kinds of systems, such as speech recognition, spoken dialog system, summarization, and spoken language translation. Our work includes implementation of building a large-scale annotated corpus base of Chinese human-human naturally-occurring corpus, dialog-act modeling and automatic recognization, spoken language translation based on dialog-act understanding. In building and annotating corpus, we improve the dialog-act annotation guidelines and give an ill-formedness description based on dialog analysis. We build an annotated Chinese dialog corpus based on telephone recordings. The corpus not only includes labels of phonetic, linguistic and paralinguistic annotation, but also describes sereval ill-formedness phenomenas. The corpus is being extended to a large corpus base of annotated Chinese dialogs for spoken Chinese study. In dialog-act modeling and recognition, we introduce a novel model to predict and tag the dialog act, in which Markov Decision Process (MDP) is utilized to predict the dialog act sequence instead of using traditional dialog act based n-gram, and Support Vector Machine (SVM) is employed to classify the dialog act for each utterance. Moreover, we investigate feature selection and combination for dialog act recognition, which improves accuracy of dialog act recognition significantly. Especially, we do experiment on several novel features and feature combination strategy. Based on annotated corpus and dialog act automatic recognition technology, we propose three kinds of applications of dialog act in phrase-based translation. Spoken translation system is benefited from the pragmatics information provided by dialog act. The consistencies of training data and test data, develop set and test set, source language and target language are improved through dialog act classification, so that translation process is more effective and translation result is more accurate in reflecting the intention of source language. In phrased-based translation system, we also propose an approach of applying semantics knowledge into phra...
馆藏号XWLW1514
其他标识符200718014628084
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6246
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
周可艳. 对话行为理解与口语翻译方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20071801462808(849KB) 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[周可艳]的文章
百度学术
百度学术中相似的文章
[周可艳]的文章
必应学术
必应学术中相似的文章
[周可艳]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。