Knowledge Commons of Institute of Automation,CAS
Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection | |
Jiao,Yifan1; Li,Zhetao2; Huang,Shucheng1; Yang,Xiaoshan3,4; Liu,Bin5; Zhang,Tianzhu3,4 | |
发表期刊 | IEEE TRANSACTIONS ON MULTIMEDIA |
2018-10 | |
卷号 | 20期号:10页码:2693-2705 |
摘要 |
The video highlight detection task is to localize key
elements (moments of user’s major or special interest) in a video.
Most of the existing highlight detection approaches extract features
from the video segment as a whole without considering the
difference of local features both temporally and spatially. Due to
the complexity of video content, this kind of mixed features will
impact the final highlight prediction. In temporal extent, not all
frames are worth watching because some of them only contain the
background of the environment without human or other moving
objects. In spatial extent, it is similar that not all regions in each
frame are highlights especially when there are lots of clutters in
the background. To solve the above problem, we propose a novel
three-dimensional (3-D) (spatial+temporal) attention model that
can automatically localize the key elements in a video without any
extra supervised annotations. Specifically, the proposed attention
model produces attention weights of local regions along both the
spatial and temporal dimensions of the video segment. The regions
of key elements in the video will be strengthened with large weights.
Thus, the more effective feature of the video segment is obtained to
predict the highlight score. The proposed 3-D attention scheme can
be easily integrated into a conventional end-to-end deep ranking
model that aims to learn a deep neural network to compute the
highlight score of each video segment. Extensive experimental
results on the YouTube and SumMe datasets demonstrate that the
proposed approach achieves significant improvement over state-of-
the-art methods. With the proposed 3-D attention model, video
highlights can be accurately retrieved in spatial and temporal
dimensions without human supervision in several domains, such
as gymnastics, parkour, skating, skiing, surfing, and dog activities,
on the public datasets. |
关键词 | Video Highlight Detection Attention Model Deep Ranking |
收录类别 | SCI |
语种 | 英语 |
WOS记录号 | WOS:000444903000013 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/22067 |
专题 | 多模态人工智能系统全国重点实验室_多媒体计算 |
作者单位 | 1.Jiangsu University of Science and Technology 2.College of Information Engineering, Xiangtan University 3.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 4.University of Chinese Academy of Sciences 5.Moshanghua Tech Company |
推荐引用方式 GB/T 7714 | Jiao,Yifan,Li,Zhetao,Huang,Shucheng,et al. Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2018,20(10):2693-2705. |
APA | Jiao,Yifan,Li,Zhetao,Huang,Shucheng,Yang,Xiaoshan,Liu,Bin,&Zhang,Tianzhu.(2018).Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection.IEEE TRANSACTIONS ON MULTIMEDIA,20(10),2693-2705. |
MLA | Jiao,Yifan,et al."Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection".IEEE TRANSACTIONS ON MULTIMEDIA 20.10(2018):2693-2705. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
Three-Dimensional At(4692KB) | 期刊论文 | 作者接受稿 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论