CASIA OpenIR

浏览/检索结果: 共10条,第1-10条 帮助

限定条件    
已选(0)清除 条数/页:   排序方式:
DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation 期刊论文
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 卷号: 34, 期号: 4, 页码: 2041-2055
作者:  Chen, Yuxin;  Zhang, Ziqi;  Qi, Zhongang;  Yuan, Chunfeng;  Wang, Jie;  Shan, Ying;  Li, Bing;  Hu, Weiming;  Qie, Xiaohu;  Wu, Jianping
Adobe PDF(13765Kb)  |  收藏  |  浏览/下载:40/1  |  提交时间:2024/05/30
Chinese video captioning evaluation  dual-reconstruction transformer  
Reducing Vision-Answer Biases for Multiple-Choice VQA 期刊论文
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 卷号: 32, 页码: 4621-4634
作者:  Zhang, Xi;  Zhang, Feifei;  Xu, Changsheng
Adobe PDF(2684Kb)  |  收藏  |  浏览/下载:87/3  |  提交时间:2023/11/17
Multiple-choice VQA  vision-answer bias  causal intervention  counterfactual interaction learning  
Multi-Source Knowledge Reasoning Graph Network for Multi-Modal Commonsense Inference 期刊论文
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 卷号: 19, 期号: 4, 页码: 17
作者:  Ma, Xuan;  Yang, Xiaoshan;  Xu, Changsheng
收藏  |  浏览/下载:102/0  |  提交时间:2023/11/17
Knowledge reasoning  multi-modal commonsense inference  graph neural network  
Contrastive Adversarial Training for Multi-Modal Machine Translation 期刊论文
ACM Transactions on Asian and Low-Resource Language Information Processing, 2023, 卷号: 22, 期号: 6, 页码: 157:1-18
作者:  Huang X(黄鑫);  Zhang JJ(张家俊);  Zong CQ(宗成庆)
Adobe PDF(2387Kb)  |  收藏  |  浏览/下载:249/71  |  提交时间:2023/06/26
contrastive learning  adversarial training  multi-modal machine translation  
VLP: A Survey on Vision-language Pre-training 期刊论文
Machine Intelligence Research, 2023, 卷号: 20, 期号: 1, 页码: 38-56
作者:  Feilong Chen;  Duzhen Zhang;  Minglun Han;  Xiuyi Chen;  Jing Shi;  Shuang Xu;  Bo Xu
Adobe PDF(969Kb)  |  收藏  |  浏览/下载:171/34  |  提交时间:2023/06/21
Explicit Cross-Modal Representation Learning for Visual Commonsense Reasoning 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 卷号: 24, 页码: 2986-2997
作者:  Zhang, Xi;  Zhang, Feifei;  Xu, Changsheng
Adobe PDF(5681Kb)  |  收藏  |  浏览/下载:411/4  |  提交时间:2022/07/25
Cognition  Video recording  Syntactics  Visualization  Task analysis  Semantics  Linguistics  Visual Commonsense Reasoning  explicit reasoning  syntactic structure  interpretability  
Question-Guided Erasing-Based Spatiotemporal Attention Learning for Video Question Answering 期刊论文
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 页码: 0
作者:  Liu, Fei;  Liu, Jing;  Hong, Richang;  Lu, Hanqing
Adobe PDF(3550Kb)  |  收藏  |  浏览/下载:364/93  |  提交时间:2022/01/27
video question answering  attention mechanism  metric learning  
Cross-Modality Synergy Network for Referring Expression Comprehension and Segmentation 期刊论文
Neurocomputing, 2022, 卷号: 467, 期号: /, 页码: 99-114
作者:  Li, Qianzhong;  Zhang, Yujia;  Sun, Shiying;  Wu, Jinting;  Zhao, Xiaoguang;  Tan, Min
Adobe PDF(4555Kb)  |  收藏  |  浏览/下载:355/58  |  提交时间:2021/12/28
Referring expression comprehension  Referring expression segmentation  Cross-modality synergy  Attention mechanism  
Transformers in computational visual media: A survey 期刊论文
Computational Visual Media, 2021, 卷号: 8, 期号: 1, 页码: 33-62
作者:  Xu,Yifan;  Wei,Huapeng;  Lin,Minxuan;  Deng,Yingying;  Sheng,Kekai;  Zhang,Mengdan;  Tang,Fan;  Dong,Weiming;  Huang,Feiyue;  Xu,Changsheng
Adobe PDF(5366Kb)  |  收藏  |  浏览/下载:326/47  |  提交时间:2021/12/28
visual transformer  computational visual media (CVM)  high-level vision  low-level vision  image generation  multi-modal learning  
Extracting Events and Their Relations from Texts: A Survey on Recent Research Progress and Challenges 期刊论文
AI Open, 2020, 卷号: 1, 期号: 1, 页码: 22-39
作者:  Kang Liu;  Yubo Chen;  Jian Liu;  Xinyu Zuo;  Jun Zhao
Adobe PDF(1821Kb)  |  收藏  |  浏览/下载:218/49  |  提交时间:2021/06/21
Event extraction  Event relation extraction  Knowledge graph