CASIA OpenIR  > 毕业生  > 硕士学位论文
汉语微博情感分析方法研究与实现
Alternative TitleResearch and Implementation of Sentiment Analysis on Chinese Micro Blog
张志琳
Subtype工程硕士
Thesis Advisor宗成庆
2011-05-20
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机技术
Keyword微博 分词 情感分类 机器学习 特征选择 半监督 Micro Blog Word Segmentation Sentiment Classification Machine Learning Feature Selection Semi-supervised Learning
Abstract微博是一个基于用户关系的信息分享、传播和获取平台,用户可以通过WEB、WAP以及各种客户端访问个人社区,以140字左右的文字更新状态,以实现即时分享。互联网每天都充斥着大量的信息,而微博作为Web 2.0时代中个人信息分享的平台载体,其发挥的作用越来越大。微博用户可以在该平台上自由地发布信息,跟随自己感兴趣的用户和社会热点事件,并且发表自己的看法和态度。通过对微博情感的分析,我们不仅可以挖掘出用户对于某一事件或者产品的看法,这些信息具有重要的商业价值,同时微博情感分析技术还有助于自然语言处理的其他研究的发展。因此,微博情感分析被越来越多的研究人员所重视,逐渐成为研究热点。 本论文重点研究微博情感分析的关键技术,旨在利用信息检索、数据挖掘和机器学习等技术,结合微博的自身特点,实现对微博情感高效、精确的分析。论文的主要工作和创新点归纳如下: 1) 微博分词:现有的中文分词工具在传统文本中的准确率已经达到了一个较高的水平,但是,由于微博文本自身的特点,传统分词工具在微博文本中的表现较差。本文研究实现了一种基于规则和最大熵模型相结合的微博分词方法。首先针对微博中的url链接、主题词、表情符号以及特殊符号等问题进行了预处理;然后利用最大熵模型,结合传统的特征选择方法,并加入了额外的词典来加强分词的效果;最后,通过后处理操作使得微博分词取得了更好的效果。实验表明,该方法有效地提升了微博分词的效果。 2) 基于多样化特征的微博情感分类方法:微博情感分析主要有三个任务阶段:主客观分类、正负极性分类和评价对象抽取。本文针对前两个任务,对应地提出了基于多样化特征的中文微博情感分类方法,并与已有的方法进行了比较。除此之外,由于情感词典对微博情感分析尤为重要,本文还提出了扩展微博情感词典的方法。基于多样化特征的中文微博情感分类方法针对目前微博情感分类在特征选择和使用上存在的缺陷,提出了三种简单、有效的特征选取和加入方法,包括词汇化主题特征、情感词内容特征和概率化的情感词倾向性特征。实验表明,该方法有效地提升了微博情感分类的效果。 3) 基于协同学习的微博情感分类方法:微博情感分析中存在一个很大的问题:虽然有监督的学习方法准确率较高,但是需要大量的人工标注数据,而人工标注数据很难适应微博日新月异的变化。所以,本文采用了半监督的学习方法,利用协同训练方法对微博文本进行情感分类,既适应了微博的不断发展的特点,又取得了较好的实验结果。
Other AbstractWith the rapid development of the Internet, micro blog is playing a more and more important role in people’s daily life. Micro blog is an information sharing, communication and acquisition platform based on the relationship between users in which users can access personal network via WEB, WAP and various clients and update message with 140 character maximum to share real-time information. On the platform, people can freely publish information, follow other users and hot social events and express their options and attitudes. Through the analysis of micro blog sentiment analysis, we can effectively mining the public’s attitude towards an event or a product which has significant commercial value. What’s more, micro blog sentiment analysis technology can also contribute to the development of other studies in natural language processing. Therefore, micro blog sentiment analysis has attracted an increasing attention of researchers and has gradually been becoming a research hotspot. This paper focuses on the key technical reasearch on Micro blog sentiment analysis, aiming at achieving efficient and accurate analysis of micro blog sentiment analysis by using information retrieval, data mining and machine learning combined with its own characteristics. The main contributionsof this paper can be concluded as follows: 1) Micro blog segmentation. Existing Chinese word segmentation tools can reach a high accuracy in traditional Chinese texts. However, it acts poorly in Micro blog text due to the characteristics of the Micro blog text itself. But the performance of the micro blog segmentation is important to the analysis. So in this paper, we propose a micro blog word segmentation method combining rules and maximum entropy model. Firstly, we pre-process the url, hashtag, emoticon and special symbols. Secondly, we use the maximum entropy combining the traditional features and external dictionary which is used to enhance the segmentation. Finally, we adopt a series of post-processing operations to get better result. Experiments show that the proposed method outperformes the state-of-the-art method significantly. 2) Sentiment analysis of micro blog based on rich features. Micro blog sentiment analysis has three main tasks: subjective and objective classification, positive and negative classification, and evaluation object extraction. This thesis mainly focuses on the first two tasks and proposes a Chinese Micro blog sentiment analysis method based on the feature dive...
shelfnumXWLW2069
Other Identifier2011E8014661091
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/7701
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
张志琳. 汉语微博情感分析方法研究与实现[D]. 中国科学院自动化研究所. 中国科学院大学,2011.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_2011E801466109(2086KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张志琳]'s Articles
Baidu academic
Similar articles in Baidu academic
[张志琳]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张志琳]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.