A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge
Yuhao Zhang1; Wenji Mao1,2; Daniel Zeng1,2
2016
会议名称The 2016 ACM International Conference on Information and Knowledge Management
会议日期October 23-28, 2016
会议地点Indianapolis, USA
摘要Mining topics in short texts (e.g. tweets, instant messages) can help people grasp essential information and understand key contents, and is widely used in many applications related to social media and text analysis. The sparsity and noise of short texts often restrict the performance of traditional topic models like LDA. Recently proposed Biterm Topic Model (BTM) which models word co-occurrence patterns directly, is revealed effective for topic detection in short texts. However, BTM has two main drawbacks. It needs to manually specify topic number, which is difficult to accurately determine when facing new corpora. Besides, BTM assumes that two words in same term should belong to the same topic, which is often too strong as it does not differentiate two types of words (i.e. general words and topical words). To tackle these problems, in this paper, we propose a nonparametric topic model npCTM with the above distinction. Our model incorporates the Chinese restaurant process (CRP) into the BTM model to determine topic number automatically. Our model also distinguishes general words from topical words by jointly considering the distribution of these two word types for each word as well as word coherence information as prior knowledge. We carry out experimental studies on real-world twitter dataset. The results demonstrate the effectiveness of our method to discover coherent topics compared with the baseline methods.
文献类型会议论文
条目标识符http://ir.ia.ac.cn/handle/173211/14510
专题多模态人工智能系统全国重点实验室_互联网大数据与信息安全
作者单位1.Institute of Automation, Chinese Academy of Sciences
2.School of Computer and Control Engineering, University of Chinese Academy of Sciences
第一作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Yuhao Zhang,Wenji Mao,Daniel Zeng. A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge[C],2016.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
A Non-Parametric Top(890KB)会议论文 开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Yuhao Zhang]的文章
[Wenji Mao]的文章
[Daniel Zeng]的文章
百度学术
百度学术中相似的文章
[Yuhao Zhang]的文章
[Wenji Mao]的文章
[Daniel Zeng]的文章
必应学术
必应学术中相似的文章
[Yuhao Zhang]的文章
[Wenji Mao]的文章
[Daniel Zeng]的文章
相关权益政策
暂无数据
收藏/分享
文件名: A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge .pdf
格式: Adobe PDF
此文件暂不支持浏览
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。