With the rapid development of mobile internet, social media has become the major platform where people share experience and obtain information. Researches on user interest modeling methods from multi-granularity sentiment, semantic and interactive relationships analysis will help mine users' interests more comprehensively and accurately so that users can obtain better information service. By referring to the progress in the field of deep learning, natural language processing and data mining, this thesis aims to study the explainable user interest modeling method based on multi-task learning. The major works of this thesis are summarized as follows:
- Users' multi-granularity sentiment analysis method based on multi-task learning. Independent modeling of each sentiment can't capture the association between sentiments and it's difficult to comprehensively mine user sentiment characteristics from multiple perspectives. To solve this problem, this thesis proposed a users' multi-granularity sentiment analysis method based on multi-task learning, which can predict users' overall sentiment and fine-grained sentiment towards each attribute simultaneously. The proposed method firstly encoded the contextual information between words with a bidirectional long short-term memory network and generated deeply semantic representation of words. Then, it adopted attention mechanism to measure words' relevance to each attribute and regarded the weighted sum of word representations as the corresponding attribute representation. Furtherly, it computed each attribute's impact on users' overall sentiment and regarded the weighted sum of attribute representations as overall representation. Finally, users' overall sentiment and fine-grained sentiment towards each attribute were predicted based on overall and attributed representations.
- User interest modeling method integrating content semantics and interactive relationships. Since the contents published by users on social media sometimes has the problem of semantic sparsity, according to the interactive characteristics of social media, this thesis proposed a user interest modeling method integrating content semantics and interactive relationships, which mined user interests from both contents and interactive network. The proposed method firstly analyzed interactive network with network embedding method and generated users' interactive representation. Then, it obtained the integrated representation of words by concatenating word vectors with users' interactive representation. Convolutional neural network is adopted to extract important features from integrated representations and generate deeply semantic representation of posts, thus exploring the impact of interactive behavior on content semantics. Furtherly, it used a bidirectional long short-term memory network to encode the dependency relationship between posts and generated users' semantic representation. Finally, it predicted users' interests based on synthetic representations integrating semantic and interactive information.
- User interest summarization extraction method based on gated mechanism and word mover's distance. In order to improve the interpretability of user interest modeling methods, this thesis proposed a user interest summarization extraction method based on gated mechanism and word mover's distance for the challenge of automatically recognizing words closely related to user interests, which extracted representative words and sentences to explain characters related to each kind of interest. The proposed method firstly adopted convolutional operation and gated mechanism to integrate background information, which controls the flow of word semantics in the neural network. Then, it used max pooling operation to select important word features, generated semantic representations and predicted interest category, thus the correlation between words and interest categories was established. Furtherly, the most important words identified by max pooling operation were regarded as interest features of each interest category. Finally, the word mover's distance is adopted to measure each sentence's relevance to interest features and the semantic redundancy between sentences. The sentences with high relevance and little semantic redundancy were extracted as interest summarization.