CASIA OpenIR
Long video question answering: A Matching-guided Attention Model
Wang, Weining1,2; Huang, Yan1,2; Wang, Liang1,2,3
Source PublicationPATTERN RECOGNITION
ISSN0031-3203
2020-06-01
Volume102Pages:11
Corresponding AuthorWang, Liang(wangliang@nlpr.ia.ac.cn)
AbstractExisting video question answering methods answer given questions based on short video snippets. The underlying assumption is that the visual content indicating the ground truth answer ubiquitously exists in the snippet. It might be problematic for long video applications, since involving large numbers of answer-irrelevant snippets will dramatically degenerate the performance. To deal with this issue, we focus on a rarely investigated but practically important problem, namely long video QA, by predicting answers directly from long videos rather than manually pre-extracted short video snippets. We accordingly propose a Matching-guided Attention Model (MAM) which jointly extracts question-related video snippets and predicts answers in a unified framework. To localize questions accurately and efficiently, we calculate corresponding matching scores and boundary regression results for candidate video snippet proposals generated by sliding windows of limited granularity. Guided by the matching scores, the model pays different attention to the extracted video snippet proposals for each question. Finally, we use the attended visual features along with the question to predict the answer in a classification manner. A key obstacle to training our model is that publicly available video QA datasets only contain short videos especially designed for short video QA. Thus, we generate two new datasets for this task on the top of TACoS Multi-level dataset and MSR-VTT dataset by generating QA pairs from the video captions, called TACoS-QA and MSR-VTT-QA. Experimental results show the effectiveness of our proposed method on both datasets by comparing with two short video QA methods and a baseline method. (C) 2020 Elsevier Ltd. All rights reserved.
KeywordLong video QA Matching-guided attention
DOI10.1016/j.patcog.2020.107248
WOS KeywordNETWORK ; IMAGE
Indexed BySCI
Language英语
Funding ProjectNational Key Research and Development Program of China[2016YFB1001000] ; National Key Research and Development Program of China[2018AAA0100402] ; National Natural Science Foundation of China[61525306] ; National Natural Science Foundation of China[61633021] ; National Natural Science Foundation of China[61721004] ; National Natural Science Foundation of China[61420106015] ; National Natural Science Foundation of China[61806194] ; National Natural Science Foundation of China[U1803261] ; National Natural Science Foundation of China[61976132] ; Capital Science and Technology Leading Talent Training Project[Z181100006318030] ; CAS-AIR ; [HW2019SOW01]
Funding OrganizationNational Key Research and Development Program of China ; National Natural Science Foundation of China ; Capital Science and Technology Leading Talent Training Project ; CAS-AIR
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS IDWOS:000525825100029
PublisherELSEVIER SCI LTD
Citation statistics
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/38878
Collection中国科学院自动化研究所
Corresponding AuthorWang, Liang
Affiliation1.Chinese Acad Sci, Ctr Res Intelligent Percept & Comp, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
3.Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Inst Automat, Beijing 100190, Peoples R China
First Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Corresponding Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Recommended Citation
GB/T 7714
Wang, Weining,Huang, Yan,Wang, Liang. Long video question answering: A Matching-guided Attention Model[J]. PATTERN RECOGNITION,2020,102:11.
APA Wang, Weining,Huang, Yan,&Wang, Liang.(2020).Long video question answering: A Matching-guided Attention Model.PATTERN RECOGNITION,102,11.
MLA Wang, Weining,et al."Long video question answering: A Matching-guided Attention Model".PATTERN RECOGNITION 102(2020):11.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Weining]'s Articles
[Huang, Yan]'s Articles
[Wang, Liang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Weining]'s Articles
[Huang, Yan]'s Articles
[Wang, Liang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Weining]'s Articles
[Huang, Yan]'s Articles
[Wang, Liang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.