CASIA OpenIR  > 模式识别国家重点实验室  > 多媒体计算
Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection
Jiao,Yifan1; Li,Zhetao2; Huang,Shucheng1; Yang,Xiaoshan3,4; Liu,Bin5; Zhang,Tianzhu3,4
The video highlight detection task is to localize key
elements (moments of user’s major or special interest) in a video.
Most of the existing highlight detection approaches extract features
from the video segment as a whole without considering the
difference of local features both temporally and spatially. Due to
the complexity of video content, this kind of mixed features will
impact the final highlight prediction. In temporal extent, not all
frames are worth watching because some of them only contain the
background of the environment without human or other moving
objects. In spatial extent, it is similar that not all regions in each
frame are highlights especially when there are lots of clutters in
the background. To solve the above problem, we propose a novel
three-dimensional (3-D) (spatial+temporal) attention model that
can automatically localize the key elements in a video without any
extra supervised annotations. Specifically, the proposed attention
model produces attention weights of local regions along both the
spatial and temporal dimensions of the video segment. The regions
of key elements in the video will be strengthened with large weights.
Thus, the more effective feature of the video segment is obtained to
predict the highlight score. The proposed 3-D attention scheme can
be easily integrated into a conventional end-to-end deep ranking
model that aims to learn a deep neural network to compute the
highlight score of each video segment. Extensive experimental
results on the YouTube and SumMe datasets demonstrate that the
proposed approach achieves significant improvement over state-of-
the-art methods. With the proposed 3-D attention model, video
highlights can be accurately retrieved in spatial and temporal
dimensions without human supervision in several domains, such
as gymnastics, parkour, skating, skiing, surfing, and dog activities,
on the public datasets.
KeywordVideo Highlight Detection Attention Model Deep Ranking
Indexed BySCI
WOS IDWOS:000444903000013
Citation statistics
Cited Times:14[WOS]   [WOS Record]     [Related Records in WOS]
Document Type期刊论文
Affiliation1.Jiangsu University of Science and Technology
2.College of Information Engineering, Xiangtan University
3.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
4.University of Chinese Academy of Sciences
5.Moshanghua Tech Company
Recommended Citation
GB/T 7714
Jiao,Yifan,Li,Zhetao,Huang,Shucheng,et al. Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2018,20(10):2693-2705.
APA Jiao,Yifan,Li,Zhetao,Huang,Shucheng,Yang,Xiaoshan,Liu,Bin,&Zhang,Tianzhu.(2018).Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection.IEEE TRANSACTIONS ON MULTIMEDIA,20(10),2693-2705.
MLA Jiao,Yifan,et al."Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection".IEEE TRANSACTIONS ON MULTIMEDIA 20.10(2018):2693-2705.
Files in This Item: Download All
File Name/Size DocType Version Access License
Three-Dimensional At(4692KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Jiao,Yifan]'s Articles
[Li,Zhetao]'s Articles
[Huang,Shucheng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Jiao,Yifan]'s Articles
[Li,Zhetao]'s Articles
[Huang,Shucheng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Jiao,Yifan]'s Articles
[Li,Zhetao]'s Articles
[Huang,Shucheng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Three-Dimensional Attention-Based Deep Ranking.pdf
Format: Adobe PDF
This file does not support browsing at this time
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.