英文摘要 | As a typical unstructured information, news is an important source of intelligence. With the development of information communication technologies, especially the development of broadcasts and television programs on the Internet, Internet is becoming a major media for news dissemination. While, Internet news has no restriction on time and space, the human society is facing serious information challenge. People are eager to have all kinds of intelligent services, which can automatically collect, filter, organize and utilize network information. Event-based news story analysis is a powerful tool, the aim of which is to effectively organize and process vast news information. In comparison with the information processing technologies techniques on English texts, it has a weaker foundation for Chinese texts. Therefore, this dissertation explores the topic on event-based news story analysis technology, which is a research issue with great significance in theory and wide perspective in application. The main achievements in this dissertation can be described as follows: 1.A new method to detect and resolve zero pronouns in Chinese text is proposed, which uses machine learning plus shallow parsing. According to the shortcomings of the rule-based approaches for anaphora resolution and the characters of zero pronouns, integrating automatic main verbs identification, verbal logic valence and machine learning approach, this method treats zero pronoun recognition as the problem of finding missing verbs logic arguments. First, based on automatic main verbs identification, syntax hierarchies were analyzed. Second, combining the syntax hierarchy and verbal logic valence theory, zero pronouns were identified. And then using a machine learning approach, zero pronouns were resolved. Experimental results demonstrated this zero pronouns identifying and resolving method works effectively. 2.A new method of extracting social network among various entities from Chinese news stories by content analysis is proposed. First, the input articles are annotated by lexical analysis. Second, the relationships among all entities are extracted by the way of main verbs recognition. For directed graph expression, an arrow is drawn between each pair of entities which have relationship from the agent argument to the patient one. Finally, all relationship expressions were established to build the social network up. Contributions of this method are summarized as follows: First, this method is b... |
修改评论