A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge
Yuhao Zhang1; Wenji Mao1,2; Daniel Zeng1,2
2016
Conference NameThe 2016 ACM International Conference on Information and Knowledge Management
Conference DateOctober 23-28, 2016
Conference PlaceIndianapolis, USA
AbstractMining topics in short texts (e.g. tweets, instant messages) can help people grasp essential information and understand key contents, and is widely used in many applications related to social media and text analysis. The sparsity and noise of short texts often restrict the performance of traditional topic models like LDA. Recently proposed Biterm Topic Model (BTM) which models word co-occurrence patterns directly, is revealed effective for topic detection in short texts. However, BTM has two main drawbacks. It needs to manually specify topic number, which is difficult to accurately determine when facing new corpora. Besides, BTM assumes that two words in same term should belong to the same topic, which is often too strong as it does not differentiate two types of words (i.e. general words and topical words). To tackle these problems, in this paper, we propose a nonparametric topic model npCTM with the above distinction. Our model incorporates the Chinese restaurant process (CRP) into the BTM model to determine topic number automatically. Our model also distinguishes general words from topical words by jointly considering the distribution of these two word types for each word as well as word coherence information as prior knowledge. We carry out experimental studies on real-world twitter dataset. The results demonstrate the effectiveness of our method to discover coherent topics compared with the baseline methods.
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/14510
Collection复杂系统管理与控制国家重点实验室_互联网大数据与信息安全
Affiliation1.Institute of Automation, Chinese Academy of Sciences
2.School of Computer and Control Engineering, University of Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Yuhao Zhang,Wenji Mao,Daniel Zeng. A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge[C],2016.
Files in This Item: Download All
File Name/Size DocType Version Access License
A Non-Parametric Top(890KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Yuhao Zhang]'s Articles
[Wenji Mao]'s Articles
[Daniel Zeng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Yuhao Zhang]'s Articles
[Wenji Mao]'s Articles
[Daniel Zeng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Yuhao Zhang]'s Articles
[Wenji Mao]'s Articles
[Daniel Zeng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge .pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.