Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining
Benjia Zhou1; Zhigang Chen2,3; Albert Clapes4,5; Jun Wan1,2,3; Yanyan Liang1; Sergio Escalera4,5,6; Zhen Lei2,3,7; Du Zhang1
Conference NameProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Conference Date2023-10
Conference PlaceParis France

Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, involving the translation of visual-gestural language to text. Many previous methods employ an intermediate representation,i.e., gloss sequences, to facilitate SLT, thus transforming it into a two-stage task of sign language recognition (SLR) followed by sign language translation (SLT). However, the scarcity of gloss-annotated sign language data, combined with the information bottleneck in the mid-level gloss representation, has hindered the further development of the SLT task. To address this challenge, we propose a novel Gloss-Free SLT base on Visual-Language Pretraining (GFSLT-VLP), which improves SLT by inheriting language-oriented prior knowledge from pre-trained models, without any gloss annotation assistance. Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training (CLIP) with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual Encoder and Text Decoder from the first stage. The seamless combination of these novel designs forms a robust sign language representation and significantly improves gloss-free sign language translation. In particular, we have achieved unprecedented improvements in terms of BLEU-4 score on the PHOENIX14T dataset (>=+5) and the CSL-Daily dataset (>=+3) compared to state-of-the-art gloss-free SLT methods. Furthermore, our approach also achieves competitive results on the PHOENIX14T dataset when compared with most of the gloss-based methods.

Sub direction classification生物特征识别
planning direction of the national heavy laboratory视觉信息处理
Paper associated data
Document Type会议论文
Corresponding AuthorJun Wan
Affiliation1.MUST, Macau, China
2.UCAS, China
3.MAIS, CASIA, China
4.Universitat de Barcelona, Spain
5.5Computer Vision Center, Spain
6.AAU, Aalborg, Denmark
7.CAIR, HKISI, CAS, Hong Kong, China
Corresponding Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Benjia Zhou,Zhigang Chen,Albert Clapes,et al. Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining[C],2023.
Files in This Item: Download All
File Name/Size DocType Version Access License
Zhou_Gloss-Free_Sign(827KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Benjia Zhou]'s Articles
[Zhigang Chen]'s Articles
[Albert Clapes]'s Articles
Baidu academic
Similar articles in Baidu academic
[Benjia Zhou]'s Articles
[Zhigang Chen]'s Articles
[Albert Clapes]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Benjia Zhou]'s Articles
[Zhigang Chen]'s Articles
[Albert Clapes]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Zhou_Gloss-Free_Sign_Language_Translation_Improving_from_Visual-Language_Pretraining_ICCV_2023_paper.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.