CASIA OpenIR  > 毕业生  > 博士学位论文
对话行为理解与口语翻译方法研究
Alternative TitleResearch on Dialog-Act Understanding and Spoken Language Translation
周可艳
Subtype工学博士
Thesis Advisor宗成庆
2010-05-27
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword对话行为 口语翻译 口语语料库 口语现象 Dialog-act Spoken Language Translation Dialog Corpus Ill-formedness
Abstract口语对话理解是指用计算机实现对口语对话的解析,是完善人机对话系统、提高口语机器翻译水平等口语处理系统的关键性问题。对话行为作为描述口语对话的语用特征,结合了交际意图与语义信息,属于浅层篇章结构的范畴,目前已被应用于语音识别系统、人机对话系统、自动文摘系统及口语翻译系统。 本文的研究内容包括大规模真实口语对话语料库建设及标注方法研究,对话行为建模和自动识别方法的研究与实现,融合对话行为理解的口语翻译方法研究。 在口语对话语料库的建设方面,本文根据中文口语对话的特点,侧重于口语现象的描述,提出了一套改进的对话行为标注规范,建立了专门的口语现象描述方法。在此基础上,建立了基于真实电话录音的汉语口语对话标注语料库。该语料库除了含有丰富的语音、语义、语用等多层标注信息,还描述了插入、重复、次序颠倒等多种口语现象。不仅可以使研究人员对口语对话理解进行研究,而且还可以针对口语现象进行分析处理。该语料库的研究,对于促进口语系统走向应用有重要的意义。 在对话行为的建模和自动识别方法研究方面,本文提出了基于马尔柯夫决策过程(Markov Decision Process, MDP)的对话行为预测模型。基于该模型的对话行为预测结果融入到基于语句的对话行为识别中,取得了较好的识别效果。在该问题研究中,作者不仅改进了识别模型,而且从特征选取角度出发,提出了基本名词短语、邻接对等有效的新特征和多种特征组合方法,使对话行为自动识别的正确率有了进一步的显著提高。 在改进口语翻译系统性能方面,本文提出了融合对话行为这一语用信息的口语翻译方法。该方法以基于短语的统计机器翻译系统为应用对象,利用对话行为的自动分类,使训练语料-测试语料、开发集-测试集、源语言-目标语言的一致性得到提高,提高了翻译系统的性能,使最终的翻译结果可以更准确地反映源语言所要表达的对话意图。 另外,本文还提出了一种基于语义词典的未登录词处理方法,该方法利用汉语同义词知识对源语言未登录词的语义进行解释,在一定程度上解决了口语翻译中未登录词的翻译问题。
Other AbstractSpoken Language Understanding is a technology that focuses on analysing dialogs automatically, which is key technology for improving the performance of spoken language processing system, such as dialog system, spoken language translation system, etc. Dialog-Act, which is a combination of a communicative function and a semantic content, belongs to shallow discourse structure. Dialog-Act has been applied in several kinds of systems, such as speech recognition, spoken dialog system, summarization, and spoken language translation. Our work includes implementation of building a large-scale annotated corpus base of Chinese human-human naturally-occurring corpus, dialog-act modeling and automatic recognization, spoken language translation based on dialog-act understanding. In building and annotating corpus, we improve the dialog-act annotation guidelines and give an ill-formedness description based on dialog analysis. We build an annotated Chinese dialog corpus based on telephone recordings. The corpus not only includes labels of phonetic, linguistic and paralinguistic annotation, but also describes sereval ill-formedness phenomenas. The corpus is being extended to a large corpus base of annotated Chinese dialogs for spoken Chinese study. In dialog-act modeling and recognition, we introduce a novel model to predict and tag the dialog act, in which Markov Decision Process (MDP) is utilized to predict the dialog act sequence instead of using traditional dialog act based n-gram, and Support Vector Machine (SVM) is employed to classify the dialog act for each utterance. Moreover, we investigate feature selection and combination for dialog act recognition, which improves accuracy of dialog act recognition significantly. Especially, we do experiment on several novel features and feature combination strategy. Based on annotated corpus and dialog act automatic recognition technology, we propose three kinds of applications of dialog act in phrase-based translation. Spoken translation system is benefited from the pragmatics information provided by dialog act. The consistencies of training data and test data, develop set and test set, source language and target language are improved through dialog act classification, so that translation process is more effective and translation result is more accurate in reflecting the intention of source language. In phrased-based translation system, we also propose an approach of applying semantics knowledge into phra...
shelfnumXWLW1514
Other Identifier200718014628084
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6246
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
周可艳. 对话行为理解与口语翻译方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20071801462808(849KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[周可艳]'s Articles
Baidu academic
Similar articles in Baidu academic
[周可艳]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[周可艳]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.