肌肉骨骼机器人神经启发式分层运动学习研究

CASIA OpenIR > 毕业生 > 博士学位论文

	肌肉骨骼机器人神经启发式分层运动学习研究
	周俊杰
	2022-05-23
页数	162
学位类型	博士
中文摘要	肌肉骨骼机器人系统是具有刚柔耦合特性的复杂系统，其与关节-连杆型机器人系统最显著的区别在于前者采用冗余、柔性的肌肉模块作为驱动器。冗余肌肉的引入从结构上实现了运动的安全性、柔顺性和可靠性，为机器人在复杂环境和困难任务中的运动能力提供了保障。然而，特殊的肌肉结构和动力学特性在带来结构和功能潜在优势的同时，也引入了肌肉骨骼系统特有的强非线性、高冗余性和强耦合性。这些特性不仅使得传统的机器人运动学习方法在这类系统上难以实现类人的高效运动学习和行为调控能力，还阻碍了肌肉骨骼机器人类人运动学习的进一步研究与探索。针对肌肉骨骼机器人系统实现类人运动学习能力的困难，本文受大脑皮层、基底神经节等神经环路的运动调控机制启发，围绕系统的运动训练困难、任务适应困难、复杂任务下行为调控困难，开展了系列研究，提出了基于任务分层、运动分层和策略分层架构的三类运动学习模型。论文的主要工作和创新点归纳如下： 1. 肌肉骨骼机器人系统的高效运动学习能力是其形成更高级智能行为的关键。本研究受生物认知学习过程中的逐次近似认知机制和内侧前运动皮层神经元知觉决策编码机制启发，建立了引入逐次近似学习机制的肌肉骨骼机器人分层运动学习算法模型，利用任务分层机制提高了肌肉骨骼机器人系统运动学习效率。首先，在任务分层计算模型的设计上，本研究根据期望目标状态与学习进度，构造了一系列不同难度层次的简化目标状态，实现了对运动学习过程的引导，有效降低了由肌肉特性导致的奖励长期信度分配困难对运动学习的不利影响，提高了运动学习效率。其次，在肌肉信号调制模型的构建中，通过模拟内侧前运动皮层神经元知觉决策编码机制，本研究将具有计算复杂度优势的离散域运动学习算法成功扩展到连续域，大幅缓解了肌肉刺激信号在复杂、高维解空间中的求解困难。本研究在包含4条肌肉、2个自由度的手臂模型上开展了实验，验证了任务分层机制对学习过程的引导作用，实现了肌肉骨骼机器人快速且稳定的运动学习。 2. 在多变的任务要求下对熟练运动技能进行灵活、鲁棒的调节，是生物行为适应能力的体现。本研究受皮层-基底神经节环路运动调控机制启发，将运动生成过程分为“信息感知-行为规划-运动执行”三个阶段，提出了一种实现任务-运动解耦的分层运动学习算法，通过运动分层机制增强了肌肉骨骼机器人系统对不确定任务的适应能力。在信息感知阶段，根据基底神经节决策机制和菲兹法则，结合机器人运动任务需求，本研究建立了主动式的速度-精度权衡模型。在行为规划阶段，本研究参考纹状体中间神经元环路的差分速度调制机制，提出了一种具有钟形特征的速度调制模型，以计算平滑的动作轨迹。在运动执行阶段，本研究在确定性策略模型中引入了状态监督信号与肌肉骨骼系统特有的拮抗肌先验结构特征，建立了不依赖具体任务的肌肉协同收缩策略模型，实现了对冗余肌肉间协同模式的学习，促进了冗余肌肉之间任务通用的合作，提高了系统对不确定任务要求的运动适应能力。本研究在包含12条肌肉和3个自由度的复杂肌肉骨骼手臂模型上进行了实验，验证了肌肉协同收缩策略的鲁棒性与任务适应能力。 3. 在非稳态决策运动任务中，使用单一行为策略通常不能实现最好的运动性能。这类任务通过要求算法模型在时间或空间层面对多个行为策略进行灵活调度，以反映生物运动学习的智能性。本研究受基底神经节尾状核对自动行为和自愿行为的分层调控机制启发，提出了一种融合了统计决策理论与强化学习理论的分层策略运动学习模型，通过策略分层机制增强了机器人在复杂非稳态决策任务中的行为调控能力。首先，针对自动行为策略偏好的不确定性问题，基于次优经济决策理论和注意力地势模型，本研究建立了基于前向视角决策的最简自动行为策略模型，实现了行为偏好的策略化表示。其次，本研究结合苍白球脑区的行为价值预测评估机制与改进的基底神经节决策模型，建立了基于后向视角决策的自愿行为策略模型，克服了在非稳态、连续情况下的决策困难。最后，通过将策略分层融合，本研究实现了机器人运动过程中对“最有利”决策和“最可信”决策的兼顾。实验中，本研究在包含3个自由度的手臂模型上进行了实验，验证了策略分层模型不仅能稳定习得具有明确语义特性的自动行为策略，而且能通过策略的时序组合实现优于单一策略的运动性能，体现出良好的行为调控能力。
英文摘要	The musculoskeletal robotic systems are intricate rigid-flexible coupling systems. The most obvious difference between them and joint-link robotic systems is that they are driven by redundant and flexible muscle models. The introduction of redundant muscle actuators achieves the safety, compliance, and reliability of motion in the view of the structure, and guarantees the motion execution of the systems in variable environments and complicated tasks. However, the characteristics of muscle structure and dynamics bring more than potential structural and functional advantages. They also introduce the special strong nonlinearities, high redundancy, and strong coupling of the musculoskeletal systems. The characteristics not only make them difficult for conventional robotic motion learning methods to achieve humanoid efficient motion learning and behavior regulation, but also hinder the further research and exploration of musculoskeletal robot human-like motion learning. Aiming at the difficulty of realizing human-like motion learning in the musculoskeletal robotic system, inspired by the neural mechanisms of motion learning in the cortex and basal ganglia circuits, this thesis carried out a series of progressive studies and proposed three types of motion learning models based on the hierarchy of task, the hierarchy of motion, and the hierarchy of policy. The main contents and innovations of the thesis are summarized as follows: 1. The efficient motion learning ability of the musculoskeletal robotic systems is the key foundation for achieving advanced intelligent behaviors. Inspired by the successive approximation learning mechanism in the biological cognitive learning tasks and the encoding mechanism of medial premotor cortex neurons in perceptual decision-making tasks, this study established a motion learning model with hierarchical task architecture for the musculoskeletal robotic system. The motion learning efficiency of the musculoskeletal robotic systems can be improved by using the mechanism of the hierarchy of task. In the design of the computational model of hierarchical task architecture, according to the expected target state and the learning progress, a series of simplified target states with variable difficulty levels are constructed for guiding the motion learning process. Thus, the adverse effects of long-term credit assignment of rewards, which were caused by muscle characteristics during motion learning can be effectively reduced. In the construction of the muscle excitation modulation model, by simulating the perceptual decision-making encoding mechanism of the medial premotor cortex, the modulation model with the advantage of computational complexity in the discrete domain was extended to the continuous domain, which greatly decreased the difficulty of solving continuous excitations in complicated high-dimensional solution space. In experiments, the proposed algorithm is carried out on the musculoskeletal arm model with 4 muscles and 2 joints. Under the guidance of the hierarchical tasks, the proposed model realized fast and stable motion learning. 2. Flexible and robust adjustment of proficient motion skills under changing task requirements is the embodiment of biological behavioral adaptability. Inspired by the motion regulation neural mechanism in the cortico–basal ganglia circuit, this study divided the motion generation process of the musculoskeletal robotic systems into three phases, including information perception, action planning, and action execution. Then, a motion learning model with a hierarchical structure for decoupling the motion from the task is proposed. By applying the mechanism of the hierarchy of motion, the behavioral adaptability of the musculoskeletal robotic systems to variable task requirements can be enhanced. During information perception, according to the decision-making neural mechanism of basal ganglia, Fitts' Law, and the requirements of motion task of the robotic system, an active speed-accuracy tradeoff model was established. In the phase of action planning, referring to the temporal differential speed modulation mechanism of the striatum interneuron circuit, a striatum-inspired velocity modulation model was proposed for smoothening the motion and generating motion supervised terms. Finally, during action execution, by introducing the motion supervised terms and the especial structural prior characteristics of the antagonistic muscles into the deterministic policy model, the task-independent muscle co-contraction policy model was established, and the policy promoted general cooperation between flexor and extensor muscles. Therefore, the whole model could realize the adaptive motion generation of the musculoskeletal system. The proposed model was implemented on a complex musculoskeletal arm model with 12 muscles and 3 degrees of freedom. According to the experiments, the robustness and the behavioral adaptability of the proposed model were adequately verified in the musculoskeletal robotic system. 3. In non-stationary decision-making motion tasks, using a single simple behavior policy usually cannot achieve the best motion performance. The flexible schedule of behavior policies on a temporal or spatial level is an important embodiment of the intelligence of biological motion learning. Inspired by the scheduling mechanism of the hierarchical behavior policy in the caudate nucleus of basal ganglia for automatic behavior and voluntary behavior, this study proposed a hierarchical policy model that combined the statistical decision theory and the reinforcement learning theory. By implementing the hierarchical policy structure, the scheduling of robotics behavior in complicated and non-stationary decision-making tasks can be improved. Firstly, aiming at the uncertainty of automatic behavior policy preference, based on the sub-optimal economic decisions phenomenon and the attentional landscape model, the simplest automatic behavior policy model based on the forward view decision-making is established, and it realized the policy representation of behavioral preference. Then, combining the action value prediction and evaluation mechanism of pallidum with the modified basal ganglia model, a voluntary behavior policy model based on the backward view decision-making is suggested, which solved the difficulty of the non-stationary and continuous decision. By integrating the policy models hierarchically, both the "most favorable" and the "most credible" decisions during the motion of robotics are realized. In experiments, we implemented the proposed algorithm on an arm model with three degrees of freedom, and verified that the hierarchical policy model can not only solve automatic behavior policies with clear semantic characteristics, but also achieve better motion performance than any single behavior policy through the temporal combination of automatic behavior policies, which shows better behavior regulation ability.
关键词	肌肉骨骼机器人系统神经启发式算法分层运动学习行为决策
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48576
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	周俊杰. 肌肉骨骼机器人神经启发式分层运动学习研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
2022-5-25-2-肌肉骨骼机器人神（25529KB）	学位论文		限制开放	CC BY-NC-SA