CASIA OpenIR

浏览/检索结果: 共58条,第1-10条 帮助

限定条件    
已选(0)清除 条数/页:   排序方式:
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 卷号: 26, 页码: 6906-6916
作者:  Wang, Wenxuan;  He, Xingjian;  Zhang, Yisi;  Guo, Longteng;  Shen, Jiachen;  Li, Jiangyun;  Liu, Jing
收藏  |  浏览/下载:3/0  |  提交时间:2024/07/03
Referring image segmentation  cross-modality guidance  masked self-distillation  vision and language  
Multi-Stage Image-Language Cross-Generative Fusion Network for Video-Based Referring Expression Comprehension 期刊论文
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 卷号: 33, 页码: 3256-3270
作者:  Zhang, Yujia;  Li, Qianzhong;  Pan, Yi;  Zhao, Xiaoguang;  Tan, Min
收藏  |  浏览/下载:7/0  |  提交时间:2024/07/03
Feature extraction  Visualization  Task analysis  Representation learning  Location awareness  Linguistics  Grounding  Video-based referring expression comprehension  multi-stage learning  image-language cross-generative fusion  consistency loss  
Comprehensive Attribute Prediction Learning for Person Search by Language 期刊论文
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 卷号: 33, 页码: 1990-2003
作者:  Niu, Kai;  Huang, Linjiang;  Long, Yuzhou;  Huang, Yan;  Wang, Liang;  Zhang, Yanning
收藏  |  浏览/下载:3/0  |  提交时间:2024/07/03
Person search by language  cross-modal retrieval  smart video surveillance  attribute prediction  
SgVA-CLIP: Semantic-Guided Visual Adapting of Vision-Language Models for Few-Shot Image Classification 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 卷号: 26, 页码: 3469-3480
作者:  Peng, Fang;  Yang, Xiaoshan;  Xiao, Linhui;  Wang, Yaowei;  Xu, Changsheng
收藏  |  浏览/下载:6/0  |  提交时间:2024/07/03
Few-shot  image classification  vision-language models  
An end-to-end model for multi-view scene text recognition 期刊论文
PATTERN RECOGNITION, 2024, 卷号: 149, 页码: 17
作者:  Banerjee, Ayan;  Shivakumara, Palaiahnakote;  Bhattacharya, Saumik;  Pal, Umapada;  Liu, Cheng-Lin
收藏  |  浏览/下载:7/0  |  提交时间:2024/07/03
Text detection  Scene text recognition  Siamese network  Natural language model  Genetic algorithm  Multi-view text detection  
Learning to Correct Erroneous Words for Document Grounded Conversations 会议论文
, Kuantan, Malaysia, 2023.02.23-2023.02.25
作者:  Junyan Qiu;  Haitao Wang;  Yiping Yang
Adobe PDF(773Kb)  |  收藏  |  浏览/下载:36/14  |  提交时间:2024/06/17
Deep Learning  Natural Language Generation  Dialogue System  Curriculum Learning  
Prompting Large Language Models for Automatic Question Tagging 期刊论文
Machine Intelligence Research, 2024, 页码: 0
作者:  Nuojia Xu;  Dizhan Xue;  Shengsheng Qian;  Quan Fang;  Jun Hu
Adobe PDF(1493Kb)  |  收藏  |  浏览/下载:38/17  |  提交时间:2024/06/04
Community Question Answering  Machine Learning  Large Language Model  Prompt Learning  Question Tagging  
CLIP-VG: Self-Paced Curriculum Adapting of CLIP for Visual Grounding 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 卷号: 26, 页码: 4334-4347
作者:  Xiao, Linhui;  Yang, Xiaoshan;  Peng, Fang;  Yan, Ming;  Wang, Yaowei;  Xu, Changsheng
收藏  |  浏览/下载:28/0  |  提交时间:2024/05/30
Grounding  Reliability  Adaptation models  Task analysis  Visualization  Data models  Annotations  Visual grounding  curriculum learning  pseudo-language label  and vision-language models  
GesGPT: Speech Gesture Synthesis With Text Parsing From ChatGPT 期刊论文
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 卷号: 9, 期号: 3, 页码: 2718-2725
作者:  Gao, Nan;  Zhao, Zeyu;  Zeng, Zhi;  Zhang, Shuwu;  Weng, Dongdong;  Bao, Yihua
收藏  |  浏览/下载:41/0  |  提交时间:2024/05/30
Semantics  Chatbots  Task analysis  Robots  Deep learning  Cognition  Annotations  Gesture synthesis  human robot interaction  large language model  
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments 期刊论文
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 页码: 1-16
作者:  Dong An;  Hanqing Wang;  Wenguan Wang;  Zun Wang;  Yan Huang;  Keji He;  Liang Wang
收藏  |  浏览/下载:21/0  |  提交时间:2024/05/27
Vision-Language Navigation  Topological Map  Obstacle Avoidance