While the Internet has brought convenience to people, it has also led to the information overload problem. Researchers have made a lot of effort to deal with information overload. We propose related topic network, a new method for information analysis based on TDT (topic detection and tracking) and relation extraction. This method can organize information according to topics and identify related topics. While many methods for TDT have emerged over many years of research, past research has not addressed relationships across topics. The main research object of relation extraction is still limited to entities and has not been extended to the topic level. We strive to synthesize research in the two areas so that related topics can be extracted. Our experimental results show that our method can improve precision and recall of topic detection and can find part of related topics. We make the following three contributions over existing research: 1. For topic detection, we propose TF-WF/DF, an improved term weighting model. In our experiment, this model outperformed past models. 2. For topic representation extraction, we propose a new title-based method. 3. For related topic extraction, we propose a sentence-level method based on a same sentence co-occurrence assumption. In summary, this thesis presents a preliminary exploration in the fusion of TDT and relation extraction.
修改评论