CASIA OpenIR  > 毕业生  > 硕士学位论文
Alternative TitleResearch and Application of Information Extraction Technology on Traffic Meteorology Disasters
Thesis Advisor台宪青
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机技术
Keyword聚焦爬虫 主题爬行 向量空间模型 交通气象灾害 结构化信息抽取 Focusing Crawler Theme Crawling Vector Space Model Traffic Meteorology Disaster Structured Information Extraction
Abstract随着汽车的普及,道路交通与人们日常生活的关系越来越密切。而当前气象灾害已成为道路交通最主要威胁因素,严重影响了人们的正常出行。交通气象灾害基础数据的匮乏使得目前无法提供专业、精确的交通预测以指导出行。信息时代的到来,互联网成为信息的主要生产者和发布者。网络中包含了大量的交通气象类信息,但由于过于分散、未经处理,造成交通气象灾害基础数据淹没在网络海洋中。 本文从海量的网络信息出发,以交通气象灾害为主题,对交通气象灾害情报提取的相关技术进行研究,设计并实现了交通气象灾害情报提取系统,为智能交通项目提供交通气象灾害的历史基础数据。本文主要工作如下: 1、通过聚焦爬虫主题爬行策略的研究,提出了以向量空间模型(VSM)为基础的主题相关度判断方法;其次针对目前基于内容预测的主题爬行方式无法穿越主题孤岛的问题以及基于链接结构预测的主题爬行方式容易出现主题漂移的状况,提出了基于Best-First Search算法和PageRank算法的综合预测主题爬行策略。经过实验验证,聚焦爬虫不仅在主题相关时能够准确的沿主题方向爬行,而且在主题相关页面采集完后能够穿越主题孤岛找到权威页面并发现新的主题相关页面。弥补了当前主题爬行算法的不足。 2、对网页文本处理技术进行了研究。首先,结合主题网页特点,提出了基于模板定制和统计相结合的正文提取方法;其次,针对交通气象灾害主题的需求,通过对网页分类的研究,提出了多视角的文本分类策略;最后,结合交通气象灾害主题,通过文本分析提出了基于主题的自动文摘方法。 3、针对交通气象灾害主题,设计了交通气象灾害情报格式;通过对结构化信息抽取技术的研究以及对大量文本内容结构的分析,设计了道路信息、路段信息、影响状况信息的抽取方法以及相互匹配的算法,提出了适合本文研究主题的结构化信息抽取方法。同时设计并实现了交通气象灾害情报提取系统,实现了对交通气象灾害情报的提取。 本文通过对交通气象灾害情报提取技术的研究,实现了主题情报的收集,为智能交通项目的交通仿真以及交通预测提供了数据支撑。
Other AbstractWith the popularity of the automobiles, road traffic and people's daily lives are getting closer. However, the meteorological disaster has become the main threat to road traffic, seriously affecting people's daily travel. The lack of traffic meteorological basis data makes it unable to provide professional, accurate traffic forecasts to guide the trip. With the arrival of the information age, the Internet has become a major producer and publisher of the information. The network contains a large number of information about traffic meteorology, but it is too scattered, untreated, causing traffic meteorological disasters basis data submerged in the network ocean. Started with the rich network resources, with the subject of traffic meteorological disasters, this thesis studied the relevant technology of the information extraction, designed and realized the traffic information extraction system on meteorological disasters to provide traffic meteorological disasters historical basis data for intelligent transportation project. Main works are as follows: 1 Through the study of focusing crawler facing vertical search engine, this thesis put forward the theme relevance judgment algorithm based on the vector space model (VSM).Then combined with the Best-First Search algorithm and PageRank algorithm, this thesis put forward theme crawling strategy based on comprehensive prediction, and solve two problems: one is the problem which current theme crawling way based on content prediction could not pass through theme island, and the other one is the theme crawling way based on link structure prediction caused the topic drift. The result of the experiment validate that focusing crawler can crawl in the subject direction when the topic matching well, after completing the collecting relevant pages, quickly pass through the theme island and find the authority pages and new topic pages. To some extent, make up for the deficiency of the current theme crawling algorithm. 2 Study the web pages text processing technology. Firstly, based on the characteristics of topic pages, we proposed text extraction method combined with template customization and statistics; Secondly, with the demand of traffic meteorological disaster theme and through the study of web pages classification, we put forward the multi-view text classification strategy; Finally, combined with traffic meteorological disaster theme and the analysis of the text, we put forward automatic summarization method suita...
Other Identifier2010E8009070047
Document Type学位论文
Recommended Citation
GB/T 7714
楚涌泉. 交通气象灾害情报提取技术研究与应用[D]. 中国科学院自动化研究所. 中国科学院大学,2013.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_2010E800907004(2008KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[楚涌泉]'s Articles
Baidu academic
Similar articles in Baidu academic
[楚涌泉]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[楚涌泉]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.