Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection

	Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection
	Jiao,Yifan 1; Li,Zhetao 2; Huang,Shucheng 1; Yang,Xiaoshan3,4 ; Liu,Bin 5; Zhang,Tianzhu3,4
发表期刊	IEEE TRANSACTIONS ON MULTIMEDIA
	2018-10
卷号	20 期号:10 页码:2693-2705
摘要	The video highlight detection task is to localize key elements (moments of user’s major or special interest) in a video. Most of the existing highlight detection approaches extract features from the video segment as a whole without considering the difference of local features both temporally and spatially. Due to the complexity of video content, this kind of mixed features will impact the final highlight prediction. In temporal extent, not all frames are worth watching because some of them only contain the background of the environment without human or other moving objects. In spatial extent, it is similar that not all regions in each frame are highlights especially when there are lots of clutters in the background. To solve the above problem, we propose a novel three-dimensional (3-D) (spatial+temporal) attention model that can automatically localize the key elements in a video without any extra supervised annotations. Specifically, the proposed attention model produces attention weights of local regions along both the spatial and temporal dimensions of the video segment. The regions of key elements in the video will be strengthened with large weights. Thus, the more effective feature of the video segment is obtained to predict the highlight score. The proposed 3-D attention scheme can be easily integrated into a conventional end-to-end deep ranking model that aims to learn a deep neural network to compute the highlight score of each video segment. Extensive experimental results on the YouTube and SumMe datasets demonstrate that the proposed approach achieves significant improvement over state-of- the-art methods. With the proposed 3-D attention model, video highlights can be accurately retrieved in spatial and temporal dimensions without human supervision in several domains, such as gymnastics, parkour, skating, skiing, surfing, and dog activities, on the public datasets.
关键词	Video Highlight Detection Attention Model Deep Ranking
收录类别	SCI
语种	英语
WOS记录号	WOS:000444903000013
引用统计	被引频次：42[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://ir.ia.ac.cn/handle/173211/22067
专题	多模态人工智能系统全国重点实验室_多媒体计算
作者单位	1.Jiangsu University of Science and Technology 2.College of Information Engineering, Xiangtan University 3.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 4.University of Chinese Academy of Sciences 5.Moshanghua Tech Company
推荐引用方式 GB/T 7714	Jiao,Yifan,Li,Zhetao,Huang,Shucheng,et al. Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2018,20(10):2693-2705.
APA	Jiao,Yifan,Li,Zhetao,Huang,Shucheng,Yang,Xiaoshan,Liu,Bin,&Zhang,Tianzhu.(2018).Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection.IEEE TRANSACTIONS ON MULTIMEDIA,20(10),2693-2705.
MLA	Jiao,Yifan,et al."Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection".IEEE TRANSACTIONS ON MULTIMEDIA 20.10(2018):2693-2705.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Three-Dimensional At（4692KB）	期刊论文	作者接受稿	开放获取	CC BY-NC-SA	浏览下载