Multimodal Spatiotemporal Representation for Automatic Depression Level Detection

	Multimodal Spatiotemporal Representation for Automatic Depression Level Detection
	Mingyue Niu1,2 ; Jianhua Tao1,2,3 ; Bin Liu1,2 ; Jian Huang1,2 ; Zheng Lian1,2
发表期刊	IEEE Transactions on Affective Computing
	2020
期号	0 页码:0
摘要	Physiological studies have shown that there are some differences in speech and facial activities between depressive and healthy individuals. Based on this fact, we propose a novel Spatio-Temporal Attention (STA) network and a Multimodal Attention Feature Fusion (MAFF) strategy to obtain the multimodal representation of depression cues for predicting the individual depression level. Specifically, we firstly divide the speech amplitude spectrum/video into fixed-length segments and input these segments into the STA network, which not only integrates the spatial and temporal information through attention mechanism, but also emphasizes the audio/video frames related to depression detection. The audio/video segment-level feature is obtained from the output of the last full connection layer of the STA network. Secondly, this paper employs the eigen evolution pooling method to summarize the changes of each dimension of the audio/video segment-level features to aggregate them into the audio/video level feature. Thirdly, the multimodal representation with modal complementary information is generated using the MAFF and inputs into the support vector regression predictor for estimating depression severity. Experimental results on the AVEC2013 and AVEC2014 depression databases illustrate the effectiveness of our method.
关键词	Multimodal depression detection Spatio-Temporal Attention Audio/Video Segment-Level Feature Eigen Evolution Pooling Audio/Video Level Feature Multimodal Attention Feature Fusion
收录类别	SCI
语种	英语
WOS记录号	WOS:000966596900001
七大方向——子方向分类	多模态智能
引用统计	被引频次：38[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44397
专题	多模态人工智能系统全国重点实验室_智能交互
通讯作者	Jianhua Tao
作者单位	1.National Laboratory of Pattern Recognition, CASIA, Beijing, China 2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3.CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China
第一作者单位	中国科学院自动化研究所
通讯作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	Mingyue Niu,Jianhua Tao,Bin Liu,et al. Multimodal Spatiotemporal Representation for Automatic Depression Level Detection[J]. IEEE Transactions on Affective Computing,2020(0):0.
APA	Mingyue Niu,Jianhua Tao,Bin Liu,Jian Huang,&Zheng Lian.(2020).Multimodal Spatiotemporal Representation for Automatic Depression Level Detection.IEEE Transactions on Affective Computing(0),0.
MLA	Mingyue Niu,et al."Multimodal Spatiotemporal Representation for Automatic Depression Level Detection".IEEE Transactions on Affective Computing .0(2020):0.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Multimodal Spatiotem（2831KB）	期刊论文	作者接受稿	开放获取	CC BY-NC-SA	浏览下载