In recent years, massive short texts have been generated by social media platforms such as Twitter and Sina Weibo, which bring the problem of information overload. Automatic summarization techniques can extract the valuable information from the vast volume of texts and compress it into a summary. An effective approach of extractive summarization is to select relevant and salient information from texts, which can be formulated as a sentence selection problem. The performance of sentence selection relies on sentence similarity measuring and sentence scoring. However, short texts generated by social media suffer from noises, non-standard grammar and severe sparsity. Therefore, applying traditional sentence scoring and Bag-of-Words (BoW) based representation methods to short texts directly does not work satisfactorily. We attempt to solve the problem in some ways as follows. First, we propose an unsupervised microblog summarization method based on key-bigram extraction. We extract key-bigrams based on hybrid TF-IDF, TextRank and topic model separately to discover the salient subtopics of a set of topic related posts. Then, we score sentences based on the key-bigram set by considering the overlap similarity or the mutual information between them. Top ranked sentences with redundancy removal are iteratively selected as summary. In experiments on Sina Weibo and Twitter datasets, our key-bigram-based summarizer is shown to perform superiorly in in sense of ROUGE-1 score, and especially the precision. Aiming to improve key-bigram extraction and sentence ranking results, we propose to extract key-bigrams base on local density. The distance between bigrams is measured by their topic distributions estimated by topic model, and a quick search algorithm is applied to calculate the local density of each bigram. A TextRank-based extractor is cascaded before it to generate candidate key-bigram set. As for sentence extraction, two ranking results are merged by considering the average value and stability of ranking of each sentence. Experimental results show that merging two ranking results can improve the quality of summary compared with single ranking, and performs better than single TextRank-based method. Futher, we propose a short text summarization method by combining deep learning-based multi-granularity similarity and submodular function optimization. Extractive summarization task is modeled as budgeted maximization of submodular functions, optimizing the coverag...
修改评论