Named entities (NE), especially named persons, locations and organizations, convey essential meaning in human languages. Therefore, NE translation and bilingual NE alignment is very important in multilingual language processing, such as machine translation and cross-lingual information retrieval. Especially in a statistical machine translation (SMT) system, NE translation is an important factor reinforcing the system performance. Moreover, bilingual NE alignment, which extracts NE pairs from bilingual corpus, not only constructs a bilingual NE dictionary that assists machine translation, but also has an effect on the quality of phrase pair extraction in SMT training process. The research work on NE translation and bilingual NE alignment proves to be crucial for improving the per-formance of machine translation. The main contributions and novelties are summarized as follows: (1) Study on NE translation properities of different NE types and two approaches to NE internal word alignment, as well as an NE translation framework (2) Study on a structure-based model for Chinese organization name translation Firstly, the inherent structures of organization names are analyzed by an appropriate chunk-unit, which reveals that the components of organization names follow a definite formula and allows the designation of the three types of chunks. Therefore, a hierarchical synchronous CFG (context-free grammar) derivation is proposed to implement the organiztion name translation. The experimental results prove that the proposed model translates the Chinese organization name into English with a good performance and demonstrates a significant improvement in the quality of translation when it is integrated into a statistical machine translation system. (3) Study on a theoretical framework of bilingual NE alignment and different align-ment strategies After constructing a general theoretical framework of bilingual NE alignment, we propose three alignment strategies accordingly, and then implement them respectivelly. In the experiments, we discover that NE recognition errors compounded in the NE alignment stage have much negative effect on the final output. Therefore, a refinement alignment approach is introduced to recover from the error propagation, which is able to identify and align bilingual NEs jointly. (4) Study on a novel bilingual NE alignment model with translation ratio and NE type constraint Based on bilingual NE corpus, it is observed that how a given NE is translated either semantically or phonetically depends greatly on its associated entity type, and entities within an aligned pair should share the same type. Accordingly, we propose a novel bilingual NE alignment model that combines basis alignment and refinement alignment.The experimental results show that the novel alignment model achieves a significant improvement of the Chinese-English NE alignment quality, as well as the performance of NE recognition.
修改评论