CASIA OpenIR

浏览/检索结果: 共26条,第1-10条 帮助

限定条件    
已选(0)清除 条数/页:   排序方式:
Multi-teacher Knowledge Distillation for End-to-End Text Image Machine Translation 会议论文
Proceedings of the 17th Document Analysis and Recognition (ICDAR 2023), San José, California, USA, August 21-26, 2023
作者:  Ma, Cong;  Zhang, Yaping;  Tu, Mei;  Zhao, Yang;  Zhou, Yu;  Zong, Chengqing
Adobe PDF(1478Kb)  |  收藏  |  浏览/下载:29/13  |  提交时间:2024/06/26
Modal Contrastive Learning Based End-to-End Text Image Machine Translation 期刊论文
IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP), 2023, 卷号: 32, 期号: 32, 页码: 2153-2165
作者:  Ma, Cong;  Han, Xu;  Wu, Linghui;  Zhang, Yaping;  Zhao, Yang;  Zhou, Yu;  Zong, Chengqing
Adobe PDF(6551Kb)  |  收藏  |  浏览/下载:29/15  |  提交时间:2024/06/26
Transformers  Machine translation  Decoding  Semantics  Pipelines  Text recognition  Task analysis  Text image machine translation  contrastive learning  text image recognition  machine translation  
交互场景下多模态抑郁程度评估与可解释性研究 学位论文
, 2023
作者:  蔡聪
Adobe PDF(5243Kb)  |  收藏  |  浏览/下载:12/0  |  提交时间:2024/06/25
抑郁程度评估  多模态  交互场景  机器学习  可解释性  
PCEN: Potential Correlation-Enhanced Network for Multimodal Named Entity Recognition 会议论文
, Charlotte, NC, USA, 02-03 October 2023
作者:  Jiakai Geng;  Chenyang Zhang;  Linjing Li;  Qing Yang;  Daniel Zeng
Adobe PDF(4985Kb)  |  收藏  |  浏览/下载:60/9  |  提交时间:2024/05/31
named entity recognition  multimodal learning  vision-language pre-trained model  inconsistency loss  
Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification 期刊论文
IEEE Transactions on Cognitive and Developmental Systems, 2023, 页码: DOI 10.1109/TCDS.2023.3327081
作者:  Guo LY(郭凌月);  Zeyu Gao;  Jinye Qu;  Suiwu Zheng;  Runhao Jiang;  Yanfeng Lu;  Hong Qiao
Adobe PDF(3922Kb)  |  收藏  |  浏览/下载:46/15  |  提交时间:2024/05/28
Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval 期刊论文
Machine Intelligence Research, 2023, 卷号: 20, 期号: 4, 页码: 569-582
作者:  Haoyu Lu;  Yuqi Huo;  Mingyu Ding;  Nanyi Fei;  Zhiwu Lu
Adobe PDF(2928Kb)  |  收藏  |  浏览/下载:57/22  |  提交时间:2024/04/23
Image-text retrieval, multimodal modeling, contrastive learning, weak correlation, computer vision  
Transformer: A General Framework from Machine Translation to Others 期刊论文
Machine Intelligence Research, 2023, 卷号: 20, 期号: 4, 页码: 514-538
作者:  Yang Zhao;  Jiajun Zhang;  Chengqing Zong
Adobe PDF(1415Kb)  |  收藏  |  浏览/下载:52/14  |  提交时间:2024/04/23
Neural machine translation, Transformer, document neural machine translation (NMT), multimodal NMT, low-resource NMT  
Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey 期刊论文
Machine Intelligence Research, 2023, 卷号: 20, 期号: 4, 页码: 447-482
作者:  Xiao Wang;  Guangyao Chen;  Guangwu Qian;  Pengcheng Gao;  Xiao-Yong Wei;  Yaowei Wang;  Yonghong Tian;  Wen Gao
Adobe PDF(3540Kb)  |  收藏  |  浏览/下载:69/15  |  提交时间:2024/04/23
Multi-modal (MM), pre-trained model (PTM), information fusion, representation learning, deep learning  
Vision Enhanced Generative Pre-trained Language Model for Multimodal Sentence Summarization 期刊论文
Machine Intelligence Research, 2023, 卷号: 20, 期号: 2, 页码: 289-298
作者:  Liqiang Jing;  Yiren Li;  Junhao Xu;  Yongcan Yu;  Pei Shen;  Xuemeng Song
Adobe PDF(2389Kb)  |  收藏  |  浏览/下载:53/24  |  提交时间:2024/04/23
Multimodal sentence summarization (MMSS)  generative pre-trained language model (GPLM)  natural language generation  deep learning  artificial intelligence  
VLP: A Survey on Vision-language Pre-training 期刊论文
Machine Intelligence Research, 2023, 卷号: 20, 期号: 1, 页码: 38-56
作者:  Fei-Long Chen;  Du-Zhen Zhang;  Ming-Lun Han;  Xiu-Yi Chen;  Jing Shi;  Shuang Xu;  Bo Xu
Adobe PDF(1427Kb)  |  收藏  |  浏览/下载:55/17  |  提交时间:2024/04/23
Vision and language  pre-training  transformers  multimodal learning  representation learning