A WebPage Content Block Detection Method Based on Layout Features and Languages Features
Han Xianpei; Liu Kang; Zhao Jun
发表期刊Chinese Journal of Computers
2008
期号22页码:15-21
摘要This paper analyzed the different feature types of web-page blocks, and presented a Web-page content block detection method based on layout features and language features, which effectively resolved the seesaw problem between detection accuracy and model generality across different types of web-pages. The method used the vision-block tree to represent web-page, built two individual classifiers respectively for web-page’s layout features and language features, and used different strategies to combine these two classifiers. The experimental results show that, with holding the content block detection recall higher than 90%, thecombined classifiers’ accuracy can reach 85 percents, 5 percents higher than the classifier using only the layout features, and 15 percents higher than the classifier using only the language features; and the experimental results also show that the combined classifiers obtained good detection performance over five selected websites which means that it have good generality.
关键词Web-page Cleaning
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/40979
专题多模态人工智能系统全国重点实验室_自然语言处理
推荐引用方式
GB/T 7714
Han Xianpei,Liu Kang,Zhao Jun. A WebPage Content Block Detection Method Based on Layout Features and Languages Features[J]. Chinese Journal of Computers,2008(22):15-21.
APA Han Xianpei,Liu Kang,&Zhao Jun.(2008).A WebPage Content Block Detection Method Based on Layout Features and Languages Features.Chinese Journal of Computers(22),15-21.
MLA Han Xianpei,et al."A WebPage Content Block Detection Method Based on Layout Features and Languages Features".Chinese Journal of Computers .22(2008):15-21.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Han Xianpei]的文章
[Liu Kang]的文章
[Zhao Jun]的文章
百度学术
百度学术中相似的文章
[Han Xianpei]的文章
[Liu Kang]的文章
[Zhao Jun]的文章
必应学术
必应学术中相似的文章
[Han Xianpei]的文章
[Liu Kang]的文章
[Zhao Jun]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。