Knowledge Commons of Institute of Automation,CAS
User behavior fusion in dialog management with multi-modal history cues | |
Yang, Minghao1,2![]() ![]() ![]() ![]() ![]() | |
发表期刊 | MULTIMEDIA TOOLS AND APPLICATIONS
![]() |
2015-11-01 | |
卷号 | 74期号:22页码:10025-10051 |
文章类型 | Article |
摘要 | It enhances user experience by making the talking avatar be sensitive to user behaviors in human computer interaction (HCI). In this study, we combine user's multi-modal behaviors with behaviors' historical information in dialog management (DM) to improve the avatar's sensitivity not only to user explicit behavior (speech command) but also to user supporting expression (emotion and gesture, etc.). In the dialog management, according to the different contributions of facial expression, gesture and head motion to speech comprehension, we divide the user's multi-modal behaviors into three categories: complementation, conflict and independence. The behavior categories could be first automatically obtained from a short-term and time-dynamic (STTD) fusion model with audio-visual input. Different behavior category leads to different avatar's response in later dialog turns. Usually, the conflict behavior reflects user's ambiguous intention (for example: user says "no" while he (her) is smiling). In this case, the trial-and-error schema is adopted to eliminate the conversation ambiguity. For the later dialog process, we divide all the avatar dialog states into four types: "Ask", "Answer", "Chat" and "Forget". With the detection of complementation and independence behaviors, the user supporting expression as well as his (her) explicit behavior could be estimated as triggers for topic maintenance or transfer among four dialog states. At the first section of experiments, we discuss the reliability of STTD model for user behavior classification. Based on the proposed dialog management and STTD model, we continue to construct a drive route information query system by connecting the user behavior sensitive dialog management (BSDM) to a 3D talking avatar. The practical conversation records of avatar with different users show that the BSDM makes the avatar be able to understand and be sensitive to the users' facial expressions, emotional voice and gesture, which improves user experience on multi-modal human computer conversation. |
关键词 | Dialog Management (Dm) Multi-modal Data Fusion Human Computer Interaction (Hci) Emotion Detection |
WOS标题词 | Science & Technology ; Technology |
DOI | 10.1007/s11042-014-2161-5 |
收录类别 | SCI |
语种 | 英语 |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Information Systems ; Computer Science, Software Engineering ; Computer Science, Theory & Methods ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000364019400011 |
是否为代表性论文 | 是 |
七大方向——子方向分类 | 人工智能+科学 |
国重实验室规划方向分类 | 多模态协同认知 |
是否有论文关联数据集需要存交 | 否 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/40843 |
专题 | 多模态人工智能系统全国重点实验室_智能交互 |
作者单位 | 1.Institute of Automation, Chinese Academy of Sciences 2.University of Chinese Academy of Sciences |
第一作者单位 | 中国科学院自动化研究所 |
推荐引用方式 GB/T 7714 | Yang, Minghao,Tao, Jianhua,Chao, Linlin,et al. User behavior fusion in dialog management with multi-modal history cues[J]. MULTIMEDIA TOOLS AND APPLICATIONS,2015,74(22):10025-10051. |
APA | Yang, Minghao.,Tao, Jianhua.,Chao, Linlin.,Li, Hao.,Zhang, Dawei.,...&Liu, Bin.(2015).User behavior fusion in dialog management with multi-modal history cues.MULTIMEDIA TOOLS AND APPLICATIONS,74(22),10025-10051. |
MLA | Yang, Minghao,et al."User behavior fusion in dialog management with multi-modal history cues".MULTIMEDIA TOOLS AND APPLICATIONS 74.22(2015):10025-10051. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
2015-Pub-MTAP User b(1839KB) | 期刊论文 | 作者接受稿 | 开放获取 | CC BY-NC-SA | 浏览 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论