A Focused Crawler Based On Relevance Analysis
Peng, Xin1; Qin, Qiuli1; He, Saike2
2013-06-01
Conference Name17th International Conference on Industrial Engineering Theory, Applications and Practice (IJIE)
Source Publication17th International Conference on Industrial Engineering Theory, Applications and Practice (IJIE)
Conference Date2013-6-1 ~ 2013-6-2
Conference PlaceBusan, Korea
AbstractWith the rapid development of network and information technology, there are huge amounts of data on the internet. But the major problem faced by researchers is how to filter out information on specific fields among them efficiently. In this paper, we try to build a focused crawler based on VSM (Vector Space Model) and TF-IDF (Term Frequency - Inverse Document Frequency) text correlation analysis. Particularly, we primarily take the seed URL as a collection entrance and fetch web pages from internet. Then we analyze page information though technologies, such as web content extraction, page link analysis technology. So we ultimately get the main content of the page. Through correlation analysis method based on VSM and TF-IDF, we calculate the relevance between pages and topics that have been defined, to get information we need.
Indexed ByEI
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/10782
Collection复杂系统管理与控制国家重点实验室_互联网大数据与信息安全
Corresponding AuthorPeng, Xin
Affiliation1.School of Economics and Management Beijing Jiaotong University
2.The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Peng, Xin,Qin, Qiuli,He, Saike. A Focused Crawler Based On Relevance Analysis[C],2013.
Files in This Item: Download All
File Name/Size DocType Version Access License
会议——IJIE2013Papers.p(247KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Peng, Xin]'s Articles
[Qin, Qiuli]'s Articles
[He, Saike]'s Articles
Baidu academic
Similar articles in Baidu academic
[Peng, Xin]'s Articles
[Qin, Qiuli]'s Articles
[He, Saike]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Peng, Xin]'s Articles
[Qin, Qiuli]'s Articles
[He, Saike]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 会议——IJIE2013Papers.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.