CASIA OpenIR

浏览/检索结果: 共31条,第1-10条 帮助

限定条件    
已选(0)清除 条数/页:   排序方式:
DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation 期刊论文
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 卷号: 34, 期号: 4, 页码: 2041-2055
作者:  Chen, Yuxin;  Zhang, Ziqi;  Qi, Zhongang;  Yuan, Chunfeng;  Wang, Jie;  Shan, Ying;  Li, Bing;  Hu, Weiming;  Qie, Xiaohu;  Wu, Jianping
Adobe PDF(13765Kb)  |  收藏  |  浏览/下载:24/0  |  提交时间:2024/05/30
Chinese video captioning evaluation  dual-reconstruction transformer  
Semantic Policy Network for Zero-Shot Object Goal Visual Navigation 期刊论文
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 卷号: 8, 期号: 11, 页码: 7655-7662
作者:  Zhao, Qianfan;  Zhang, Lu;  He, Bin;  Liu, Zhiyong
Adobe PDF(1888Kb)  |  收藏  |  浏览/下载:122/9  |  提交时间:2023/12/21
Deep learning  path planning  reinforcement learning  vision-based navigation  
Large sequence models for sequential decision-making: a survey 期刊论文
FRONTIERS OF COMPUTER SCIENCE, 2023, 卷号: 17, 期号: 6, 页码: 18
作者:  Wen, Muning;  Lin, Runji;  Wang, Hanjing;  Yang, Yaodong;  Wen, Ying;  Mai, Luo;  Wang, Jun;  Zhang, Haifeng;  Zhang, Weinan
Adobe PDF(1351Kb)  |  收藏  |  浏览/下载:132/1  |  提交时间:2023/11/17
sequential decision-making  sequence modeling  the Transformer  training system  
VLP: A Survey on Vision-language Pre-training 期刊论文
Machine Intelligence Research, 2023, 卷号: 20, 期号: 1, 页码: 38-56
作者:  Feilong Chen;  Duzhen Zhang;  Minglun Han;  Xiuyi Chen;  Jing Shi;  Shuang Xu;  Bo Xu
Adobe PDF(969Kb)  |  收藏  |  浏览/下载:161/32  |  提交时间:2023/06/21
Semi-supervised cross-modal image generation with generative adversarial networks 期刊论文
Pattern Recognition, 2020, 卷号: 100, 页码: 107085
作者:  Li D(李丹);  Du CD(杜长德);  He HG(何晖光)
Adobe PDF(4031Kb)  |  收藏  |  浏览/下载:130/37  |  提交时间:2023/05/05
The Model May Fit You: User-Generalized Cross-Modal Retrieval 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 卷号: 24, 页码: 2998-3012
作者:  Ma, Xinhong;  Yang, Xiaoshan;  Gao, Junyu;  Xu, Changsheng
Adobe PDF(6549Kb)  |  收藏  |  浏览/下载:276/53  |  提交时间:2022/06/17
cross-modal retrieval  domain generalization  meta-learning  
Question-Guided Erasing-Based Spatiotemporal Attention Learning for Video Question Answering 期刊论文
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 页码: 0
作者:  Liu, Fei;  Liu, Jing;  Hong, Richang;  Lu, Hanqing
Adobe PDF(3550Kb)  |  收藏  |  浏览/下载:354/89  |  提交时间:2022/01/27
video question answering  attention mechanism  metric learning  
3D-SceneCaptioner: Visual Scene Captioning Network for Three-Dimensional Point Clouds 会议论文
, 广东省珠海市, 2021-12
作者:  Yu, Qiang;  Pan, Xianbing;  Xiang, Shiming;  Pan, Chunhong
Adobe PDF(3412Kb)  |  收藏  |  浏览/下载:168/27  |  提交时间:2022/01/14
Scene Captioning  Three-Dimensional Vision  Point Cloud  
Graph-based Multimodal Ranking Models for Multimodal Summarization 期刊论文
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 卷号: 20, 期号: 4, 页码: 21
作者:  Zhu, Junnan;  Xiang, Lu;  Zhou, Yu;  Zhang, Jiajun;  Zong, Chengqing
Adobe PDF(4193Kb)  |  收藏  |  浏览/下载:306/62  |  提交时间:2021/12/28
Multimodal summarization  single-modal  multimodal ranking  unsupervised  
Transformers in computational visual media: A survey 期刊论文
Computational Visual Media, 2021, 卷号: 8, 期号: 1, 页码: 33-62
作者:  Xu,Yifan;  Wei,Huapeng;  Lin,Minxuan;  Deng,Yingying;  Sheng,Kekai;  Zhang,Mengdan;  Tang,Fan;  Dong,Weiming;  Huang,Feiyue;  Xu,Changsheng
Adobe PDF(5366Kb)  |  收藏  |  浏览/下载:313/43  |  提交时间:2021/12/28
visual transformer  computational visual media (CVM)  high-level vision  low-level vision  image generation  multi-modal learning