DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation
Chen, Yuxin1,2; Zhang, Ziqi1; Qi, Zhongang3; Yuan, Chunfeng1; Wang, Jie4; Shan, Ying3; Li, Bing1,5; Hu, Weiming1,2,6; Qie, Xiaohu7; Wu, Jianping8
发表期刊IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
ISSN1051-8215
2024-04-01
卷号34期号:4页码:2041-2055
摘要

Video captioning evaluation aims at assessing the semantic consistency between video and candidate text, which should include measurement from two aspects: faithfulness (whether the information conveyed by candidate is correct w.r.t. video) and comprehensiveness (whether the main video content is covered by candidate). However, previous approaches have difficulty in evaluating faithfulness and comprehensiveness due to heavy reliance on references or heterogeneous of visual and textual data. In this paper, we propose a vision-involved evaluation metric based on a novel DuAl-Reconstruction Transformer, named DARTScore. DARTScore formulates the caption evaluation task as a dual-reconstruction problem to evaluate both faithfulness and comprehensiveness explicitly. Since the word in a candidate is usually related to several frames, DARTScore adaptively collects relevant frames to reconstruct the word and computes the reconstruction accuracy as faithfulness to inherently reflect whether the word information is contained in the video. In the inversive way, DARTScore reconstructs each frame with relevant words to evaluate comprehensiveness. By integrating fine-grained bidirectional reconstruction accuracies, DARTScore drills into each word in candidate and each frame in video to fully evaluate the semantic consistency. Furthermore, we collect and annotate two Chinese datasets with a large domain gap, named CRAETE-EVAL and VATEX-ZH-EVAL, to systematically evaluate existing metrics and fill the blank of Chinese video captioning evaluation. Experimental results show that DARTScore achieves higher correlation with human judgments, has lower reference reliance, and generalizes well to data from different domains.

关键词Chinese video captioning evaluation dual-reconstruction transformer
DOI10.1109/TCSVT.2023.3299932
关键词[WOS]NETWORK
收录类别SCI
语种英语
资助项目Beijing Natural Science Foundation
项目资助者Beijing Natural Science Foundation
WOS研究方向Engineering
WOS类目Engineering, Electrical & Electronic
WOS记录号WOS:001197960500021
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
七大方向——子方向分类图像视频处理与分析
国重实验室规划方向分类视觉信息处理
是否有论文关联数据集需要存交
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/57057
专题多模态人工智能系统全国重点实验室_视频内容安全
通讯作者Yuan, Chunfeng
作者单位1.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
3.Tencent PCG, ARC Lab, Shenzhen 518057, Peoples R China
4.Tencent PCG, IPS Search, Shenzhen 518057, Peoples R China
5.People AI Inc, Beijing 100190, Peoples R China
6.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
7.Tencent PCG, Shenzhen 518057, Peoples R China
8.Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
第一作者单位中国科学院自动化研究所
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Chen, Yuxin,Zhang, Ziqi,Qi, Zhongang,et al. DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2024,34(4):2041-2055.
APA Chen, Yuxin.,Zhang, Ziqi.,Qi, Zhongang.,Yuan, Chunfeng.,Wang, Jie.,...&Wu, Jianping.(2024).DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,34(4),2041-2055.
MLA Chen, Yuxin,et al."DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 34.4(2024):2041-2055.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
DARTScore__DuAl_Reco(13765KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Chen, Yuxin]的文章
[Zhang, Ziqi]的文章
[Qi, Zhongang]的文章
百度学术
百度学术中相似的文章
[Chen, Yuxin]的文章
[Zhang, Ziqi]的文章
[Qi, Zhongang]的文章
必应学术
必应学术中相似的文章
[Chen, Yuxin]的文章
[Zhang, Ziqi]的文章
[Qi, Zhongang]的文章
相关权益政策
暂无数据
收藏/分享
文件名: DARTScore__DuAl_Reconstruction_Transformer_for_Video_Captioning_Evaluation (2).pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。