CASIA OpenIR  > 模式识别国家重点实验室  > 自然语言处理
基于网络语义标签的多源知识库实体对齐算法
王雪鹏; 刘康; 何世柱; 刘树林; 张元哲; 赵军
Source Publication计算机学报
2017
Volume40Issue:3Pages:701-711
Abstract 知识库是多种自然语言处理任务的重要数据资源,但单一知识库覆盖度低,不同知识库异构性强,不利于数据的共享和集成.因此,多源知识库融合技术的研究有着十分重要的意义.其中,多源知识库实体对齐是多源知识库融合技术中的重要组成部分.在语义万维网发展的推动下,国外开展了很多相关工作,大多适用于英文知识库,对于中文知识库的研究较少.出于对中文知识库融合的研究目的,该文提出了一种基于网络语义标签的多源知识库实体对齐算法.该算法综合利用属性标签、类别标签和非结构化文本关键词,对齐中文百科实体.经实验测试,该算法能够较好地解决多源知识库实体对齐问题,算法在近95%的准确率下,仍能保持近55%的较好的召回率,应用于实际系统中,满足了实际的多源知识库实体对齐应用需求.
Other AbstractKnowledge base is an essential data source in many natural language processing tasks. But the coverage of the uni-source knowledge base is so narrow. Moreover, the hierarchies of different knowledge bases are also different. So, there are much of difficulties in data sharing and integrating between different knowledge bases. Hence, the investigation on multi-source knowledge bases alignment turns to be much of significance. And multi-source knowledge bases entity alignment is an important component in multi-source knowledge bases aligning techniques. Driven by the development of the Semantic Web, there emerge numerous investigations on knowledge bases alignment among foreign researchers; most of them focus on the knowledge bases in English. But there are fewer similar works on the knowledge bases in Chinese. To explore the knowledge bases in Chinese, we proposed a kind of multi-source knowledge bases entity aligning method by leveraging the semantic tags. This method utilized attribute triples, category tags and key words from the unstructured text synthetically to align entities which are from Chinese encyclopedias .The experiments showed that our method makes an effective performance in solving the problem of knowledge bases entity alignment. It renders a 95% accuracy and a 55% recall at the same time. Our method works well in the utility system and satisfies the actual application requirements of the entity aliment in the multi-source knowledge bases.
Keyword语义标签 多源知识库 实体对齐 异构 实体歧义
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/14490
Collection模式识别国家重点实验室_自然语言处理
Corresponding Author刘康
Affiliation中国科学院自动化研究所模式识别国家重点实验
Recommended Citation
GB/T 7714
王雪鹏,刘康,何世柱,等. 基于网络语义标签的多源知识库实体对齐算法[J]. 计算机学报,2017,40(3):701-711.
APA 王雪鹏,刘康,何世柱,刘树林,张元哲,&赵军.(2017).基于网络语义标签的多源知识库实体对齐算法.计算机学报,40(3),701-711.
MLA 王雪鹏,et al."基于网络语义标签的多源知识库实体对齐算法".计算机学报 40.3(2017):701-711.
Files in This Item: Download All
File Name/Size DocType Version Access License
基于网络语义标签的多源知识库实体对齐算法(747KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[王雪鹏]'s Articles
[刘康]'s Articles
[何世柱]'s Articles
Baidu academic
Similar articles in Baidu academic
[王雪鹏]'s Articles
[刘康]'s Articles
[何世柱]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王雪鹏]'s Articles
[刘康]'s Articles
[何世柱]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 基于网络语义标签的多源知识库实体对齐算法.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.