CASIA OpenIR  > 模式识别国家重点实验室  > 自然语言处理
A WebPage Content Block Detection Method Based on Layout Features and Languages Features
Han Xianpei; Liu Kang; Zhao Jun
Source PublicationChinese Journal of Computers
2008
Issue22Pages:15-21
AbstractThis paper analyzed the different feature types of web-page blocks, and presented a Web-page content block detection method based on layout features and language features, which effectively resolved the seesaw problem between detection accuracy and model generality across different types of web-pages. The method used the vision-block tree to represent web-page, built two individual classifiers respectively for web-page’s layout features and language features, and used different strategies to combine these two classifiers. The experimental results show that, with holding the content block detection recall higher than 90%, thecombined classifiers’ accuracy can reach 85 percents, 5 percents higher than the classifier using only the layout features, and 15 percents higher than the classifier using only the language features; and the experimental results also show that the combined classifiers obtained good detection performance over five selected websites which means that it have good generality.
KeywordWeb-page Cleaning
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/20665
Collection模式识别国家重点实验室_自然语言处理
Recommended Citation
GB/T 7714
Han Xianpei,Liu Kang,Zhao Jun. A WebPage Content Block Detection Method Based on Layout Features and Languages Features[J]. Chinese Journal of Computers,2008(22):15-21.
APA Han Xianpei,Liu Kang,&Zhao Jun.(2008).A WebPage Content Block Detection Method Based on Layout Features and Languages Features.Chinese Journal of Computers(22),15-21.
MLA Han Xianpei,et al."A WebPage Content Block Detection Method Based on Layout Features and Languages Features".Chinese Journal of Computers .22(2008):15-21.
Files in This Item: Download All
File Name/Size DocType Version Access License
A WebPage Content Bl(199KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Han Xianpei]'s Articles
[Liu Kang]'s Articles
[Zhao Jun]'s Articles
Baidu academic
Similar articles in Baidu academic
[Han Xianpei]'s Articles
[Liu Kang]'s Articles
[Zhao Jun]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Han Xianpei]'s Articles
[Liu Kang]'s Articles
[Zhao Jun]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: A WebPage Content Block Detection Method Based on Layout Features and Languages Features.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.