英文摘要 | With the popularity of the automobiles, road traffic and people's daily lives are getting closer. However, the meteorological disaster has become the main threat to road traffic, seriously affecting people's daily travel. The lack of traffic meteorological basis data makes it unable to provide professional, accurate traffic forecasts to guide the trip. With the arrival of the information age, the Internet has become a major producer and publisher of the information. The network contains a large number of information about traffic meteorology, but it is too scattered, untreated, causing traffic meteorological disasters basis data submerged in the network ocean. Started with the rich network resources, with the subject of traffic meteorological disasters, this thesis studied the relevant technology of the information extraction, designed and realized the traffic information extraction system on meteorological disasters to provide traffic meteorological disasters historical basis data for intelligent transportation project. Main works are as follows: 1 Through the study of focusing crawler facing vertical search engine, this thesis put forward the theme relevance judgment algorithm based on the vector space model (VSM).Then combined with the Best-First Search algorithm and PageRank algorithm, this thesis put forward theme crawling strategy based on comprehensive prediction, and solve two problems: one is the problem which current theme crawling way based on content prediction could not pass through theme island, and the other one is the theme crawling way based on link structure prediction caused the topic drift. The result of the experiment validate that focusing crawler can crawl in the subject direction when the topic matching well, after completing the collecting relevant pages, quickly pass through the theme island and find the authority pages and new topic pages. To some extent, make up for the deficiency of the current theme crawling algorithm. 2 Study the web pages text processing technology. Firstly, based on the characteristics of topic pages, we proposed text extraction method combined with template customization and statistics; Secondly, with the demand of traffic meteorological disaster theme and through the study of web pages classification, we put forward the multi-view text classification strategy; Finally, combined with traffic meteorological disaster theme and the analysis of the text, we put forward automatic summarization method suita... |
修改评论