中国科学院自动化研究所机构知识库
Advanced  
CASIA OpenIR  > 数字内容技术与服务研究中心  > 期刊论文
题名: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification
作者: Wang, Peng1; Xu Bo(徐波)1; Xu Jiaming(许家铭)1; Tian Guanhua(田冠华)1; Liu Chenglin(刘成林)1, 2; Hao Hongwei(郝红卫)1
刊名: NEUROCOMPUTING
出版日期: 2016-01-22
卷号: 174, 页码:806-814
关键词: Short text ; Classification ; Clustering ; Convolutional neural network ; Semantic units ; Word embeddings
DOI: 10.1016/j.neucom.2015.09.096
文章类型: Article
英文摘要: Text classification can help users to effectively handle and exploit useful information hidden in large-scale documents. However, the sparsity of data and the semantic sensitivity to context often hinder the classification performance of short texts. In order to overcome the weakness, we propose a unified framework to expand short texts based on word embedding clustering and convolutional neural network (CNN). Empirically, the semantically related words are usually close to each other in embedding spaces. Thus, we first discover semantic cliques via fast clustering. Then, by using additive composition over word embeddings from context with variable window width, the representations of multi-scale semantic units(1) in short texts are computed. In embedding spaces, the restricted nearest word embeddings (NWEs)(2) of the semantic units are chosen to constitute expanded matrices, where the semantic cliques are used as supervision information. Finally, for a short text, the projected matrix(3) and expanded matrices are combined and fed into CNN in parallel. Experimental results on two open benchmarks validate the effectiveness of the proposed method. (C) 2015 Elsevier B.V. All rights reserved.
WOS标题词: Science & Technology ; Technology
类目[WOS]: Computer Science, Artificial Intelligence
研究领域[WOS]: Computer Science
收录类别: SCI
项目资助者: National Natural Science Foundation of China(61203281 ; Hundred Talents Program of Chinese Academy of Sciences(Y3S4011D31) ; 61303172 ; 61403385)
语种: 英语
WOS记录号: WOS:000367276900025
Citation statistics:
内容类型: 期刊论文
URI标识: http://ir.ia.ac.cn/handle/173211/10583
Appears in Collections:数字内容技术与服务研究中心_期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
Neurocomputing-2015.pdf(627KB)期刊论文作者接受稿开放获取View Download

作者单位: 1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
2.Natl Lab Pattern Recognit, Beijing 100190, Peoples R China

Recommended Citation:
Peng Wang,Bo Xu,Jiaming Xu,et al. Semantic Expansion using Word Embedding Clustering and Convolutional Neural Network for Improving Short Text Classification[J]. Neurocomputing,2016,174:806-814.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[Wang, Peng]'s Articles
[Xu, Bo]'s Articles
[Xu, Jiaming]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[Wang, Peng]‘s Articles
[Xu, Bo]‘s Articles
[Xu, Jiaming]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
文件名: Neurocomputing-2015.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院自动化研究所 - Feedback
Powered by CSpace