Find Who to Look at: Turning From Action to Saliency

doi:10.1109/TIP.2018.2837106

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 视频内容安全

	Find Who to Look at: Turning From Action to Saliency
	Xu, Mai 1; Liu, Yufan1,2 ; Hu, Roland 3; He, Feng 1
发表期刊	IEEE TRANSACTIONS ON IMAGE PROCESSING
ISSN	1057-7149
	2018-09-01
卷号	27 期号:9 页码:4529-4544
通讯作者	He, Feng(robinleo@buaa.edu.cn)
摘要	The past decade has witnessed the use of high-level features in saliency prediction for both videos and images. Unfortunately, the existing saliency prediction methods only handle high-level static features, such as face. In fact, high-level dynamic features (also called actions), such as speaking or head turning, are also extremely attractive to visual attention in videos. Thus, in this paper, we propose a data-driven method for learning to predict the saliency of multiple-face videos, by leveraging both static and dynamic features at high-level. Specifically, we introduce an eye-tracking database, collecting the fixations of 39 subjects viewing 65 multiple-face videos. Through analysis on our database, we find a set of high-level features that cause a face to receive extensive visual attention. These high-level features include the static features of face size, center-bias and head pose, as well as the dynamic features of speaking and head turning. Then, we present the techniques for extracting these high-level features. Afterwards, a novel model, namely multiple hidden Markov model (M-HMM), is developed in our method to enable the transition of saliency among faces. In our M-HMM, the saliency transition takes into account both the state of saliency at previous frames and the observed high-level features at the current frame. The experimental results show that the proposed method is superior to other state-of-the-art methods in predicting visual attention on multiple-face videos. Finally, we shed light on a promising implementation of our saliency prediction method in locating the region-of-interest, for video conference compression with high efficiency video coding.
关键词	Video analysis saliency prediction face
DOI	10.1109/TIP.2018.2837106
关键词[WOS]	VIDEO CODING HEVC ; VISUAL-ATTENTION ; SPATIOTEMPORAL SALIENCY ; MODEL ; FACE ; EFFICIENCY ; IMAGE ; SCENE ; GAZE ; COMPRESSION
收录类别	SCI
语种	英语
资助项目	Natural Key R&D Program of China[2017YFB1002400] ; NSFC projects[61573037] ; Fok Ying-Tong Education Foundation[151061] ; Zhejiang Public Welfare Research Program[2016C31062] ; Natural Science Foundation of Zhejiang Province[LY16F010004] ; Natural Key R&D Program of China[2017YFB1002400] ; NSFC projects[61573037] ; Fok Ying-Tong Education Foundation[151061] ; Zhejiang Public Welfare Research Program[2016C31062] ; Natural Science Foundation of Zhejiang Province[LY16F010004]
项目资助者	Natural Key R&D Program of China ; NSFC projects ; Fok Ying-Tong Education Foundation ; Zhejiang Public Welfare Research Program ; Natural Science Foundation of Zhejiang Province
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号	WOS:000435518500008
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计	被引频次：12[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://ir.ia.ac.cn/handle/173211/27996
专题	多模态人工智能系统全国重点实验室_视频内容安全
通讯作者	He, Feng
作者单位	1.Beihang Univ, Sch Elect & Informat Engn, Beijing 100191, Peoples R China 2.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China 3.Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
推荐引用方式 GB/T 7714	Xu, Mai,Liu, Yufan,Hu, Roland,et al. Find Who to Look at: Turning From Action to Saliency[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING,2018,27(9):4529-4544.
APA	Xu, Mai,Liu, Yufan,Hu, Roland,&He, Feng.(2018).Find Who to Look at: Turning From Action to Saliency.IEEE TRANSACTIONS ON IMAGE PROCESSING,27(9),4529-4544.
MLA	Xu, Mai,et al."Find Who to Look at: Turning From Action to Saliency".IEEE TRANSACTIONS ON IMAGE PROCESSING 27.9(2018):4529-4544.