Find objects and focus on highlights: Mining object semantics for video highlight detection via graph neural networks

	Find objects and focus on highlights: Mining object semantics for video highlight detection via graph neural networks
	Zhang, Yingying1,2 ; Gao, Junyu1,2 ; Yang, Xiaoshan1,2,4 ; Liu, Chang 3; Li, Yan3 ; Xu, Changsheng1,2,4
	2020-04-03
会议名称	AAAI Conference on Artificial Intelligence
会议日期	2020-02-07
会议地点	Palo Alto, California USA
摘要	With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user's major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH-GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.
其他摘要
七大方向——子方向分类	图像视频处理与分析
国重实验室规划方向分类	小样本高噪声数据学习
是否有论文关联数据集需要存交	否
文献类型	会议论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51531
专题	多模态人工智能系统全国重点实验室
作者单位	1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2.School of Artifical Intelligence, University of Chinese Academy of Sciences 3.Kuaishou Technology 4.Peng Cheng Laboratory
第一作者单位	模式识别国家重点实验室
推荐引用方式 GB/T 7714	Zhang, Yingying,Gao, Junyu,Yang, Xiaoshan,et al. Find objects and focus on highlights: Mining object semantics for video highlight detection via graph neural networks[C],2020.