CASIA OpenIR  > 毕业生  > 硕士学位论文
Alternative TitleRelated Topic Network
Thesis Advisor王飞跃 ; 曾大军
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机应用技术
Keyword话题识别 话题表示词提取 关联话题抽取 Topic Detection Topic Representation Extraction Related Topic Extraction
Abstract互联网技术的发展给人们带来便利的同时,也造成了信息过载。为了降低信息分析的代价,研究人员做了大量的研究工作。我们基于话题识别与追踪和关系抽取的研究,提出了新的辅助信息分析的方法-关联话题网络。它能够把大量的信息按照话题的方式组织起来,并发现有关联关系的话题。 话题识别与追踪的研究已经开展了很多年,但是并没有涉及发现话题之间的关系抽取。关系抽取的主要研究对象还只是限于实体,并没有扩展到话题层面上。我们力图把两者的研究进行有机地结合,使得话题之间的关联关系能够被提取出来。通过实验,可以得出这样的结论,我们的方法提升了话题识别的准确率和召回率,能够发现部分关联话题。 我们在已有研究的基础上,进行了三个方面的探索: 1. 在话题识别方面,对词权重的衡量模型进行了改进,提出了TF-WF/DF。在实验中,这个模型要好于之前的模型。 2. 在话题表示词的提取方面,提出了新的基于文档标题的话题表示词提取方法。 3. 在关联话题挖掘方面,基于同句共现假设,提出了句子级别的关联关系话题挖掘方法。 总结以上,本文是在话题识别与追踪和关系抽取结合领域的初步探索。
Other AbstractWhile the Internet has brought convenience to people, it has also led to the information overload problem. Researchers have made a lot of effort to deal with information overload. We propose related topic network, a new method for information analysis based on TDT (topic detection and tracking) and relation extraction. This method can organize information according to topics and identify related topics. While many methods for TDT have emerged over many years of research, past research has not addressed relationships across topics. The main research object of relation extraction is still limited to entities and has not been extended to the topic level. We strive to synthesize research in the two areas so that related topics can be extracted. Our experimental results show that our method can improve precision and recall of topic detection and can find part of related topics. We make the following three contributions over existing research: 1. For topic detection, we propose TF-WF/DF, an improved term weighting model. In our experiment, this model outperformed past models. 2. For topic representation extraction, we propose a new title-based method. 3. For related topic extraction, we propose a sentence-level method based on a same sentence co-occurrence assumption. In summary, this thesis presents a preliminary exploration in the fusion of TDT and relation extraction.
Other Identifier200728017029249
Document Type学位论文
Recommended Citation
GB/T 7714
常超. 关联话题网络[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20072801702924(1684KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[常超]'s Articles
Baidu academic
Similar articles in Baidu academic
[常超]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[常超]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.