口语解析与短语翻译对自动抽取方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	口语解析与短语翻译对自动抽取方法研究
其他题名	Approach to Spoken Language Parsing and Automatic Extraction of Phrase Translation
	左云存
	2005-05-01
学位类型	工学硕士
中文摘要	本文针对基于中间语言的口语翻译方法中的口语解析问题和基于统计模型的口语翻译方法中的短语翻译对自动抽取问题以及口语翻译系统实验平台建设等几个方面做了相关的研究和实现工作，主要内容归纳为如下几点： 1．论文在面向中间转换格式(Interchange Format，IF)【"NESPOLE",2002】的口语解析方法研究方面，提出了一种基于语义分类树的汉语口语浅层语义解析方法，用来获取汉语口语句子的浅层语义领域行为(IF 的一部分)。该方法利用统计模型从训练语料中自动获取语义规则构造语义分类树，并利用语义分类树对句子中和领域行为密切相关的词语进行解析，然后对多个词的解析结果利用统计解析模型进行选择和组合，从而生成句子的领域行为表示。规则自动获取方法避免了人工制定规则的繁琐性和主观性，保证了解析具有较高的鲁棒性；利用统计模型对领域行为各部分进行组合，避免了对 IF 表达能力的影响；与HMM 相比，语义分类树扩大了解析窗口，更好地利用了上下文信息。实验结果表明：这种方法在限定领域内进行汉语口语浅层语义解析具有较高的准确率和鲁棒性。 2．基于统计模型的口语翻译方法是目前口语翻译研究领域中非常重要的方法。基于短语的统计翻译方法与基于单个词的统计翻译方法相比，可以更好地处理句子中短语内部词语之间的关系，从而有效地提高机器翻译系统的性能。基于短语的统计翻译方法之一把短语翻译对作为知识源加入到系统中，这样整个系统的性能对于使用的短语翻译对的质量具有非常大的依赖性，针对这个问题，论文提出了一种改进的基于 HMM 的短语翻译对抽取方法，从大规模训练语料中自动抽取高质量的短语翻译对，作为统计口语翻译的知识源。这种方法首先利用 HMM 对双语句子进行双向对齐，然后根据对齐的结果抽取短语翻译对，针对不同的对齐情况利用词语翻译概率作不同的后处理工作，提高了短语翻译对的质量。实验结果证明，这种方法抽取的短语翻译对具有较高的质量。 3．在上述工作和已有技术的基础上，我们建立了英汉口语翻译系统实验平台，集成了语音识别、语音合成和多种口语翻译方法，并实现了多种翻译方法之间的有效结合，为口语翻译的深入研究提供了一个较好的实验环境。
英文摘要	Spoken language translation (SLT) is an important application of speech and language technology, and it is related to the linguistics, computer science, speech recognition, communication and other techonologies. The research of SLT is of great significance. This thesis presents some researches on spoken language parsing, automatic extraction of phrase translation and the building of spoken language translation experiment platform, which are all important work of SLT research. The main work is summarized as follows: 1. We propose a new approach to spoken Chinese parsing based on semantic classification trees (SCT). In this approach, the semantic classification trees, which are built by the semantic rules automatically learned from the training data, are used to disambiguate key words related to the sentences’ shallow semantic meaning, and a statistical model is used to extract the whole sentence’s shallow semantic meaning--domain action. This approach has followed strongpoint: (1) It is robust and easy to be implemented as the rules are automatically learned from training corpus; (2) The efficiency is enhanced as it uses more context information than HMM based approach; (3) Different part of domain action can be combined with each other freely. The experimental results proved that this approach has good performance and is feasible for the restricted domain oriented Chinese spoken language understanding in the shallow semantic level. 2. Statistical translation is a very important approach of spoken language translation. Phrase-based statistical translation models are effective in improving translation quality as they can deal with the relationships between words in sentences better than word-based translation models. One approach of phrase-based translation integrates phrase translations as knowledge sources into system, and the system’s performance greatly depends on the quality of phrase translations. In this thesis, we propose a new approach of phrase translation extraction based on HMM-based word alignment. At first, the word alignment of bilingual sentences are implemented based on HMM, then, the phrase translations are extracted and processed from the alignment result. The experiment results proved that the phrase translations extracted by this approach is of high quality. 3. An SLT experiment platform is built based on above work and previous technologies. In the platform, speech recognition module, text-to-speech module and three different translation approaches is integated. The platform affords a good environment for the researches of SLT.
关键词	口语解析短语翻译自动抽取
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6898
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	左云存. 口语解析与短语翻译对自动抽取方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2005.