CASIA OpenIR  > 学术期刊  > Machine Intelligence Research
How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges
Haotong Qin1;  Ge-Peng Ji2; Salman Khan3; Deng-Ping Fan1; Fahad Shahbaz Khan3; Luc Van Gool1
发表期刊Machine Intelligence Research
ISSN2731-538X
2023
卷号20期号:5页码:605-613
摘要

Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI. Notably, Bard has recently been updated to handle visual inputs alongside text prompts during conversations. Given Bard's impressive track record in handling textual inputs, we explore its capabilities in understanding and interpreting visual data (images) conditioned by text questions. This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Gener ative models, especially in addressing complex computer vision problems that demand accurate visual and language understanding. Specifically, in this study, we focus on 15 diverse task scenarios encompassing regular, camouflaged, medical, under-water and remote sensing data to comprehensively evaluate Bard's performance. Our primary finding indicates that Bard still struggles in these vision scenarios, highlighting the significant gap in vision-based understanding that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, leading to enhanced capabilities in comprehending and interpreting fine grained visual data. Our project is released on https://github.com/htqin/GoogleBard-VisUnderstand.

关键词Google Bard, multi-modal understanding, visual comprehension, large language models, conversational AI, chatbot
DOI10.1007/s11633-023-1469-x
七大方向——子方向分类其他
国重实验室规划方向分类其他
是否有论文关联数据集需要存交
中文导读https://mp.weixin.qq.com/s/zRrjXKl7hhEjeD1nVI0PVQ
引用统计
被引频次:1[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/55998
专题学术期刊_Machine Intelligence Research
作者单位1.Computer Vision Lab (CVL), ETH Zürich, Zürich 8001, Switzerland
2.College of Engineering, Computing & Cybernetics, Australian National University, Canberra 8105, Australia
3.Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi 999041, UAE
推荐引用方式
GB/T 7714
Haotong Qin, Ge-Peng Ji,Salman Khan,et al. How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges[J]. Machine Intelligence Research,2023,20(5):605-613.
APA Haotong Qin, Ge-Peng Ji,Salman Khan,Deng-Ping Fan,Fahad Shahbaz Khan,&Luc Van Gool.(2023).How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges.Machine Intelligence Research,20(5),605-613.
MLA Haotong Qin,et al."How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges".Machine Intelligence Research 20.5(2023):605-613.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
MIR-2023-08-155.pdf(10373KB)期刊论文出版稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Haotong Qin]的文章
[ Ge-Peng Ji]的文章
[Salman Khan]的文章
百度学术
百度学术中相似的文章
[Haotong Qin]的文章
[ Ge-Peng Ji]的文章
[Salman Khan]的文章
必应学术
必应学术中相似的文章
[Haotong Qin]的文章
[ Ge-Peng Ji]的文章
[Salman Khan]的文章
相关权益政策
暂无数据
收藏/分享
文件名: MIR-2023-08-155.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。