Sentiment text classification involves the theory of both text content understanding and pattern recognition. Studying this subject is academically valuable to not only assist the development of natural language understanding but also enrich the content of pattern recognition. Currently, people become more and more conditioned to express their opinions or sentiment information on the web. As a result, there exists a huge amount of documents that expressed as product reviews, forum reviews or personal BLOG articles. To deal with the text with sentiment, research work on text classification has transferred from traditional topic-based classification to sentiment-based classification. Therefore, Study-ing sentiment text classification is also valuable for real applications. The main contributions are summarized as follows: (1) We theoretical analyze six popular feature selection methods for text classifica-tion and propose two basic measurements, document frequency and category ratio meas-urements. Based on the theoretical analysis, we propos a new feature selection called weighted log likelihood ratio (WLLR) method. The experimental results show that this new method performs very well in sentiment classification of different domains. (2) We give the theoretical explanation to the two important fusion rules (the prod-uct and sum rule) for combining multiple classifiers. The explanation puts them in the framework of Bayes theory and gives the dependence conditions they need. Moreover, we implement a multiple classifier system for sentiment text classification to fusing dif-ferent feature sets. Experimental results show that the two fusion rules both improve the classification performance. (3) We address the problem of multi-domain sentiment classification and present two methods to the problem. Sentiment classification is a domain-specific problem. When designing a real application system on sentiment text classification, we need to collect annotated data from multiple domains to guarantee a good performance. Given the training data from multiple domains, we propose two methods, feature-level and classi-fier-level fusion, to train classifiers using all the data simultaneously. Experimental re-sults show that multi-domain sentiment classification using these two methods performs much better than single domain classification (using the training data individually). (4) We apply classifier combination methods to multiple domain adaptation for sen-timent text classification. Domain adaptation for sentiment classification is a very practi-cal problem. We focus on the problem of multi-domain adaptation where there exists more than one source domain. We propose a method called ensemble driven self-training method to deal with this problem. Experimental results show that our proposed method makes the multi-domain adaptation performs better than single domain adaptation for sentiment text classification.
修改评论