英文摘要 | With the development of web 2.0, individuals use web to distribute and share sentiment information. However, when faced with large quantity of sentiment information, how to distinguish useful information? How to analyze, organize the sentiment information effectively are the key issue in text information processing field. Under this background, opinion mining or sentiment analysis which aims to analyse opinion holders’sentiment concerning entityes has become a hot research topic recently, and has many important applications, such as business intelligence, public opinion supervision and sentiment information retrieval. The quality of web reviews varies significantly, thus low-quality reviews bring a great challenge to opinion mining. Therefore, quality assessment should be done in advance. Meanwhile, the pluralism and diversity of reviews broaden the scope and depth of opinion; consequently, fine-grained opinion mining is urgently demanded. Finally, opinion leaders usually can capture the most representative opinion and play a crucial role in opinion diffusion. Identifying opinion leaders is very important.In this paper, we investigate the key methods for review-oriented opinion mining, which include review quality assessment, product feature (aspect) and opinion word extraction, sentiment polarity recognition,topic level opinion leader identification, and sentiment retrieval.The main research content includes the following: 1. We propose an low-quality review detection model which combines review features with reviewer features. We show empirically that significant improvements can be achieved if reviewer features are integrated. In addition, we analyze the reviewer features and refine the most predictive reviewer features. 2. We propose a Conditional Random Fields (CRFs) based product feature and opinion word extraction approach, which treat the extraction task as a sequence labeling task. Specially, model is trained based on words, part of speech, syntactic and lexical features. Meanwhile, we propose a semi-supervised model by combining bootstrapping strategy with CRFs models. Finally, product features are clustered based on semantic and string similarity. 3. We propose a lexicon based sentiment polarity identification approach. Domain specific sentiment knowledge base is bulit based on semi-structured reviews and general lexicon, and then sentiment polarity is determined according to the knowledge base. Experimental results indicate the knowledge b... |
修改评论