英文摘要 | With the development of Internet technology, there are a lot of subjective texts containing opinion or sentiment on the Internet. The task to analysis, mine and manage these texts, has become an important research topic. The content of sentiment analysis is very broad. It relates to many fundamental research directions, such as natural language processing, pattern recognition, machine learning, information retrieval, data mining, etc. Therefore, it also has important research value. Sentiment classification is an important content of sentiment analysis. Its task is to recognize the opinion or sentiment involved in the subjective text. Current research in sentiment classification generally follows the methodology in topical text classification, where the vector space model is employed for text representation, and then some statisti-cal machine learning methods are used for classification. To address the drawbacks of traditional methods, in this thesis, we focus on the integration of linguistic knowledge and ensemble learning technique, and try to solve the following two problems: how to find significant features to sentiment classification, and how to effectively integrate these features with classification models. The main contribution of this thesis can be summa-rized as follows: (1) We propose a part-of-speech information based ensemble model for sentiment classification. According to different parts of speech, the unigram features are divided into several subsets. Different classification algorithms are then employed to construct several base classifiers. Finally we use the ensemble learning methods to integrate these base classifiers efficiently. Three types of ensemble methods, namely the fixed combina-tion, weighted combination and meta-learning classifier are evaluated on five wide-ly-used datasets with three ensemble strategies. The experimental results show the pro-posed method can significantly improve the classification performance. (2) We extend the resource of features, and propose a word relation based ensemble model for sentiment classification. Particularly, we explore the use of bigrams and word dependency relations, which can to some extent, capture the word order and syntactic in-formation respectively. Similarly, we experiment with three types of ensemble methods and three ensemble strategies. The results show the word relation based ensemble model can gain an extra improvement in classification accuracy. Furthermore, we made in-dep... |
修改评论