Named entity translation and bilingual named entity extraction are very important in many tasks of natural language processing, such as machine translation, cross-lingual information retrieval, etc. Therefore, they attract more and more attention from researchers. As new areas, the technologies of named entity translation and extraction are not fully developed, with many problems to be studied and solved. This dissertation designs a framework for obtaining translations of named entities, which combines translation and extraction together, and concentrates on the research on the method of organization translation and the method of extracting named entity pairs from bilingual comparable corpus. (1) We design a framework for Chinese-English entity translation and extraction. In this framework, Chinese named entities are directly translated into corresponding English named entities by translation module; or some translation candidates are generated and evaluated by network module to obtain correct English translations. On the other side, we also extract Chinese-English named entity pairs from Internet bilingual corpora, including bilingual comparable corpora, Web corpora etc. Therefore, named entity translation lists can be constructed to assist translation. (2) We design and realize a rule-constrained Chinese-English organization translation method. In the method, a series of keyword-triggered translation rules are generated according to the characteristics of Chinese-English organization name translation, and are used in the training and decoding process of statistical organization translation. In more detail, we integrate translation rules and some other statistic models under the framework of the maximum entropy statistic machine translation. These statistical models include four types of phrase translation models, word penalty, lexical mapping model and permutation model. The results of experiment show that translation rules play positive roles in both training and decoding, and the rule-constrained Chinese-English organization translation method is better than the two baseline systems. (3) We design and realize a multi-feature based Chinese-English named entity extraction method from bilingual comparable corpora. This method integrates features inside and outside named entities to extract bilingual named entity pairs from comparable corpora. These features include transliteration feature, contextual feature, word translation feature and length feature. In the process of calculating feature scores, we make full use of the characteristics of the three types of named entities. Especially, we consider the changing of word order in translation when computing the translation feature's score. Experiment results show that all features are useful and the method of calculating the translation score gets a better performance than the method which does not consider word order.
修改评论