受情绪调控机制启发的机器人运动决策方法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 机器人理论与应用

	受情绪调控机制启发的机器人运动决策方法研究
	黄销
	2020-05-26
页数	148
学位类型	博士
中文摘要	在复杂环境中，机器人如何通过自主学习和推理获取环境知识，并进行精确、高效和快速的高级决策，是当前机器人智能技术亟待解决的关键问题之一。这技术的突破将极大提高机器人自主运动与灵巧操作的快速性、高效性和准确性，对国家国防建设、智能制造和国民生活产生重大而深远的影响。近年来，随着人工智能、机器人学和神经科学等多学科的发展，许多基于学习的智能体决策理论方法在机器人知识和技能自主学习任务中取得了巨大的性能突破。然而，依旧面临学习效率低、泛化能力差、缺乏形成目标导向策略的能力以及缺乏急速适应动态环境的能力等共性问题。情绪是调控决策的重要因素之一，将情绪机制引入到智能体运动决策框架中调控高级认知与自主决策，最近逐渐成为一个研究热点。目前关于构建情绪调控的机器人决策研究仍处于初级阶段。大多研究简化了生物大脑复杂的情绪生成和调控机制，通过建立简单数学模型，来提高智能体的学习决策能力。本文借鉴大脑多层级决策系统协同工作以及情绪调控决策的神经机制，提出了一系列机器人多层级决策与控制算法，并建立情绪生成和调节模型，对决策过程进行动态调控，进而提高机器人学习的精度和效率，提高机器人急速适应动态环境的能力。取得了以下研究成果：受情绪通过调节决策参数的方式来影响学习和决策过程这一神经机制启发，提出一种新的引入情绪调控机制和Oja学习准则的机器人运动学习方法。采用Oja强化学习准则来更新多层动态循环网络的权重，并利用奖励信号的信息熵生成情绪状态变化量，对学习元参数进行在线调控。所提方法能够控制复杂机器人系统以更高的精度和更快的学习速率完成有奖励延迟的多目标导向任务。借鉴情绪调控目标导向行为和习惯行为的神经机制，提出一种引入情绪调控的机器人基于模型与无模型融合的决策方法。首先，将基于模型和无模型的决策过程统一成一个策略优化问题，仅仅通过调节规划时间就能实现两个过程的平滑过渡。其次，建立脑启发式情绪加工模型，能够根据状态和奖励预测误差信息生成情绪性响应信号，对规划时间进行动态调节。所提方法不仅能够提高机器人学习效率和学习精度，还能够逐渐加快决策速度。受情绪通过影响主观评价和动机来调控决策过程这一神经机制启发，提出一种基于内部好奇与动机驱动的移动机器人自主决策方法。首先建立一种融合情景记忆信息的情绪加工计算模型，所产生的情绪响应融合了对外部刺激的效价、当前状态的新奇度以及对目标状态的动机。同时，建立一种情绪驱动的基于模型的机器人决策控制方法，该方法利用环境概率预测模型对未来短期的状态变化进行预测，并基于情绪性内部奖励对策略进行短期规划，使得智能体在奖励非常稀疏的环境中保持高的探索和学习效率。借鉴哺乳动物视网膜信息加工及恐惧情绪生成的生物学机制，提出一种脑启发式视觉-恐惧快速反应模型。首先融合神经动力学方法和运动能量法建立初级视觉运动感知神经网络计算模型，不仅能够快速检测移动小目标的运动方向，还能够快速感知迫近运动目标。之后，建立视觉—恐惧生成神经网络计算模型，根据对运动目标的检测信号模拟恐惧情绪产生的神经动力学过程，基于阈值法求得快速决策的策略输出，提高机器人碰撞预警和快速决策的精度和速度。
英文摘要	In the field of robotics, it is one of the key issues that how robots can acquire environmental knowledge through autonomous learning, and make accurate, efficient and fast decisions in some complex environments. The breakthrough of this technology will greatly improve the speed, efficiency and accuracy of the autonomous movement and dexterous operation, which has a significant and profound impact on national defense construction, intelligent manufacturing and national life. In recent years, with the development of multi-disciplines such as artificial intelligence, robotics, and neuroscience, many learning-based decision-making methods have achieved outstanding performance in the autonomous learning tasks of robot knowledge and skills. However, there are still some common problems such as low efficiency of learning, poor ability of generalization, lack of ability to develop goal-oriented strategies and lack of ability to adapt to dynamic environments quickly. Emotion is one of the important factors in modulating decision making. Currently, research on the construction of emotion-modulated decision making is still in its infancy. Most of the research simplifies the complex emotion generation and regulation mechanisms of the biological brain, and tries to improve the ability of decision making by building simple mathematical models. In this paper, a series of robotic multi-level decision-making and control algorithms are proposed based on the neural mechanism of emotion-modulated decision making in the brain. Meanwhile, some models of emotion generation and regulation are established to dynamically modulate the process of decision making, which can improve the accuracy and efficiency of robot learning, and improve the robot's ability to adapt rapidly to dynamic environments. Inspired by the fact that emotion can modulate the process of decision making through adjusting the decision parameters, a novel emotion-modulated Oja learning rule has been proposed. Therein, the Oja reinforcement learning rule is used to update the weights of a multi-layer dynamic recurrent neural network. The information entropy of reward signals is used to generate the emotional valence for adjusting the decision parameters online. The proposed method is able to control some complex robotic systems to perform the goal-directed and delayed-reinforcement tasks with higher accuracy and a faster learning rate. Inspired by the neural mechanism of emotion modulation on the goal-directed and habitual behaviors, a new approach to connect model-based and model-free control with emotion modulation has been proposed. This decision-making framework bridges a gap between model-based and model-free control processes through only adjusting the planning horizon. If the planning horizon decreases to zero, the model-based control will transform into the model-free control smoothly. Meanwhile, we build a biologically plausible computational model of emotion processing. This model can generate an uncertainty-related emotional response on the basis of the state prediction error and reward prediction error, and then dynamically modulates the planning horizon in the tasks. The proposed decision-making framework not only can improve the learning efficiency and the accuracy of decision-making, but also can gradually accelerate the decision-making with continuous learning. Inspired by the fact that emotional reactions are incorporated into the computation of subjective value during decision-making in humans, an emotion-motivated decision-making framework has been proposed. Specifically, we firstly build a brain-inspired computational model of amygdala-hippocampus interaction to generate emotional reactions. The intrinsic emotion derives from the external reward and episodic memory, and represents three psychological states: valence, novelty and motivational relevance. Then, a model-based decision-making approach with emotional intrinsic rewards is proposed to solve the continuous control problem of mobile robots. This method executes online model-based planning based on a learned environmental model and a model-free guiding policy. The proposed approach has higher learning efficiency and maintains a higher level of exploration, especially in some very sparse-reward environments. Based on the biological mechanisms of mammalian retinal information processing and fear generation, a brain-inspired visual-fear rapid response model has been proposed. Firstly, a computational model of the primary visual motion perception is built based on the theories of neurodynamics and motion energy. this visual processing model not only can quickly detect the direction of motion of small moving targets, but also can generate a large response to the looming targets. Then, a computational model of fear-generating neural network is established to simulate the neurodynamic process of fear generation in response to the visual stimuli. The rapid policy is computed based on the intensity of fear. The proposed method is able to improve the accuracy and speed of robot collision warning and rapid response.
关键词	脑启发式计算情绪生成与调节情绪调控决策基于模型动态规划无模型学习强化学习
语种	中文
资助项目	National Key Research and Development Program of China[2017YFB1300200] ; National Key Research and Development Program of China[2017YFB1300203] ; National Natural Science Foundation of China[91648205] ; National Natural Science Foundation of China[61627808] ; National Natural Science Foundation of China[61702516] ; Strategic Priority Research Program of Chinese Academy of Science[XDB32050100] ; Development of Science and Technology of Guangdong Province Special Fund Project[2016B090910001]
七大方向——子方向分类	强化与进化学习
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39098
专题	多模态人工智能系统全国重点实验室_机器人理论与应用
通讯作者	黄销
推荐引用方式 GB/T 7714	黄销. 受情绪调控机制启发的机器人运动决策方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
博士学位论文-黄销-上传.pdf（20542KB）	学位论文		开放获取	CC BY-NC-SA