CASIA OpenIR

浏览/检索结果: 共30条,第1-10条 帮助

限定条件    
已选(0)清除 条数/页:   排序方式:
DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation 期刊论文
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 卷号: 34, 期号: 4, 页码: 2041-2055
作者:  Chen, Yuxin;  Zhang, Ziqi;  Qi, Zhongang;  Yuan, Chunfeng;  Wang, Jie;  Shan, Ying;  Li, Bing;  Hu, Weiming;  Qie, Xiaohu;  Wu, Jianping
Adobe PDF(13765Kb)  |  收藏  |  浏览/下载:22/0  |  提交时间:2024/05/30
Chinese video captioning evaluation  dual-reconstruction transformer  
Semantic Policy Network for Zero-Shot Object Goal Visual Navigation 期刊论文
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 卷号: 8, 期号: 11, 页码: 7655-7662
作者:  Zhao, Qianfan;  Zhang, Lu;  He, Bin;  Liu, Zhiyong
Adobe PDF(1888Kb)  |  收藏  |  浏览/下载:120/8  |  提交时间:2023/12/21
Deep learning  path planning  reinforcement learning  vision-based navigation  
VLP: A Survey on Vision-language Pre-training 期刊论文
Machine Intelligence Research, 2023, 卷号: 20, 期号: 1, 页码: 38-56
作者:  Feilong Chen;  Duzhen Zhang;  Minglun Han;  Xiuyi Chen;  Jing Shi;  Shuang Xu;  Bo Xu
Adobe PDF(969Kb)  |  收藏  |  浏览/下载:161/32  |  提交时间:2023/06/21
Semi-supervised cross-modal image generation with generative adversarial networks 期刊论文
Pattern Recognition, 2020, 卷号: 100, 页码: 107085
作者:  Li D(李丹);  Du CD(杜长德);  He HG(何晖光)
Adobe PDF(4031Kb)  |  收藏  |  浏览/下载:129/37  |  提交时间:2023/05/05
The Model May Fit You: User-Generalized Cross-Modal Retrieval 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 卷号: 24, 页码: 2998-3012
作者:  Ma, Xinhong;  Yang, Xiaoshan;  Gao, Junyu;  Xu, Changsheng
Adobe PDF(6549Kb)  |  收藏  |  浏览/下载:276/53  |  提交时间:2022/06/17
cross-modal retrieval  domain generalization  meta-learning  
Question-Guided Erasing-Based Spatiotemporal Attention Learning for Video Question Answering 期刊论文
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 页码: 0
作者:  Liu, Fei;  Liu, Jing;  Hong, Richang;  Lu, Hanqing
Adobe PDF(3550Kb)  |  收藏  |  浏览/下载:351/89  |  提交时间:2022/01/27
video question answering  attention mechanism  metric learning  
3D-SceneCaptioner: Visual Scene Captioning Network for Three-Dimensional Point Clouds 会议论文
, 广东省珠海市, 2021-12
作者:  Yu, Qiang;  Pan, Xianbing;  Xiang, Shiming;  Pan, Chunhong
Adobe PDF(3412Kb)  |  收藏  |  浏览/下载:167/27  |  提交时间:2022/01/14
Scene Captioning  Three-Dimensional Vision  Point Cloud  
Graph-based Multimodal Ranking Models for Multimodal Summarization 期刊论文
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 卷号: 20, 期号: 4, 页码: 21
作者:  Zhu, Junnan;  Xiang, Lu;  Zhou, Yu;  Zhang, Jiajun;  Zong, Chengqing
Adobe PDF(4193Kb)  |  收藏  |  浏览/下载:304/61  |  提交时间:2021/12/28
Multimodal summarization  single-modal  multimodal ranking  unsupervised  
Transformers in computational visual media: A survey 期刊论文
Computational Visual Media, 2021, 卷号: 8, 期号: 1, 页码: 33-62
作者:  Xu,Yifan;  Wei,Huapeng;  Lin,Minxuan;  Deng,Yingying;  Sheng,Kekai;  Zhang,Mengdan;  Tang,Fan;  Dong,Weiming;  Huang,Feiyue;  Xu,Changsheng
Adobe PDF(5366Kb)  |  收藏  |  浏览/下载:311/43  |  提交时间:2021/12/28
visual transformer  computational visual media (CVM)  high-level vision  low-level vision  image generation  multi-modal learning  
Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 卷号: 23, 页码: 2386-2397
作者:  Wang, Wei;  Gao, Junyu;  Yang, Xiaoshan;  Xu, Changsheng
Adobe PDF(2165Kb)  |  收藏  |  浏览/下载:344/46  |  提交时间:2021/11/02
Feature extraction  Encoding  Task analysis  Semantics  Data models  Cognition  Focusing  Video-text retrieval  graph neural network  coarse-to-fine strategy