CASIA OpenIR  > 模式识别国家重点实验室  > 自然语言处理
A WebPage Content Block Detection Method Based on Layout Features and Languages Features
Han Xianpei; Liu Kang; Zhao Jun
2008
发表期刊Chinese Journal of Computers
期号22页码:15-21
摘要This paper analyzed the different feature types of web-page blocks, and presented a Web-page content block detection method based on layout features and language features, which effectively resolved the seesaw problem between detection accuracy and model generality across different types of web-pages. The method used the vision-block tree to represent web-page, built two individual classifiers respectively for web-page’s layout features and language features, and used different strategies to combine these two classifiers. The experimental results show that, with holding the content block detection recall higher than 90%, thecombined classifiers’ accuracy can reach 85 percents, 5 percents higher than the classifier using only the layout features, and 15 percents higher than the classifier using only the language features; and the experimental results also show that the combined classifiers obtained good detection performance over five selected websites which means that it have good generality.
关键词Web-page Cleaning
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/20665
专题模式识别国家重点实验室_自然语言处理
推荐引用方式
GB/T 7714
Han Xianpei,Liu Kang,Zhao Jun. A WebPage Content Block Detection Method Based on Layout Features and Languages Features[J]. Chinese Journal of Computers,2008(22):15-21.
APA Han Xianpei,Liu Kang,&Zhao Jun.(2008).A WebPage Content Block Detection Method Based on Layout Features and Languages Features.Chinese Journal of Computers(22),15-21.
MLA Han Xianpei,et al."A WebPage Content Block Detection Method Based on Layout Features and Languages Features".Chinese Journal of Computers .22(2008):15-21.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
A WebPage Content Bl(199KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Han Xianpei]的文章
[Liu Kang]的文章
[Zhao Jun]的文章
百度学术
百度学术中相似的文章
[Han Xianpei]的文章
[Liu Kang]的文章
[Zhao Jun]的文章
必应学术
必应学术中相似的文章
[Han Xianpei]的文章
[Liu Kang]的文章
[Zhao Jun]的文章
相关权益政策
暂无数据
收藏/分享
文件名: A WebPage Content Block Detection Method Based on Layout Features and Languages Features.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。