CASIA OpenIR  > 紫东太初大模型研究中心  > 大模型计算
TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training
Yulong Liu1,2; Guibo Zhu1,4,5; Bin Zhu3; Qi Song3; Guojing Ge1; Haoran Chen1,4; Guanhui Qiao1,4; Ru Peng2; Lingxiang Wu1; Jinqiao Wang1,4,5
2022-11-28
Conference Name36th Conference on Neural Information Processing Systems
Conference Date2022-11-28至 2022-12-9
Conference PlaceNew Orleans Convention Center ,America
Abstract
Vision-Language Pre-training (VLP) has been shown to be an efficient method to improve the performance of models on different vision-and-language downstream tasks. Substantial studies have shown that neural networks may be able to learn some general rules about language and visual concepts from a large-scale weakly labeled image-text dataset. However, most of the public cross-modal datasets that contain more than 100M image-text pairs are in English; there is a lack of available
large-scale and high-quality Chinese VLP datasets. In this work, we propose a new framework for automatic dataset acquisition and cleaning with which we construct a new large-scale and high-quality cross-modal dataset named as TaiSu, containing 166 million images and 219 million Chinese captions. Compared with the recently released Wukong dataset, our dataset is achieved with much stricter
restrictions on the semantic correlation of image-text pairs. We also propose to combine texts collected from the web with texts generated by a pre-trained image captioning model. To the best of our knowledge, TaiSu is currently the largest publicly accessible Chinese cross-modal dataset. Furthermore, we test our dataset on several vision-language downstream tasks. TaiSu outperforms BriVL by a large margin on the zero-shot image-text retrieval task and zero-shot image classification
task. TaiSu also shows better performance than Wukong on the image-retrieval task without using image augmentation for training. Results demonstrate that TaiSu can serve as a promising VLP dataset, both for understanding and generative tasks. More information can be referred to https://github.com/ksOAn6g5/TaiSu.
IS Representative Paper
Sub direction classification图像视频处理与分析
planning direction of the national heavy laboratory视觉信息处理
Paper associated data
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57294
Collection紫东太初大模型研究中心_大模型计算
Corresponding AuthorGuibo Zhu
Affiliation1.Institute of Automation, Chinese Academy of Sciences
2.Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University
3.School of Artificial Intelligence, Beijing Normal University
4.School of Artificial Intelligence, University of Chinese Academy of Sciences
5.Wuhan AI Research
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Corresponding Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Yulong Liu,Guibo Zhu,Bin Zhu,et al. TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training[C],2022.
Files in This Item: Download All
File Name/Size DocType Version Access License
会议-1-TaiSu A 166M La(2408KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Yulong Liu]'s Articles
[Guibo Zhu]'s Articles
[Bin Zhu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Yulong Liu]'s Articles
[Guibo Zhu]'s Articles
[Bin Zhu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Yulong Liu]'s Articles
[Guibo Zhu]'s Articles
[Bin Zhu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 会议-1-TaiSu A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.