CASIA OpenIR  > 毕业生  > 博士学位论文
受大脑运动准备及肌肉协同机制启发的肌肉骨骼机器人运动学习研究
王萧娜
2024-05-13
页数121
学位类型博士
中文摘要

  当前传统的关节连杆型机器人技术日益成熟,以其高精度和高稳定性的优势在工业制造等领域发挥着重要作用,可以代替人完成重复性高、危险性强、负载量大的任务。随着机器人技术及社会的发展,人们对机器人的期望也逐渐提高,期望机器人能够保证安全性的同时与人类更好地交互,能够适应非结构化的动态环境,实现像人一样的灵巧操作,然而传统机器人在这些方面仍具有一定的局限性。相比之下,人体肌肉骨骼系统在中枢神经系统控制下,在复杂多变的环境中展现出了卓越的运动及操作能力。肌肉骨骼机器人模拟人体骨骼、关节结构,采用具有类人肌肉特性的冗余肌肉进行驱动,并采用类人的肌肉排布模式。因此该类机器人有望从结构上具有人体肌肉骨骼系统所展现的柔顺性、安全性、灵巧性及适应性等潜在优势,是一种突破当前传统机器人局限性的可能的解决方案。但是,特殊的肌肉排布模式与动力学特性也引入了高度冗余性、强耦合性和强非线性,使得实现该类系统的稳定精准控制是一项具有挑战性的任务。现有的肌肉骨骼机器人控制方法仍面临着在缺乏大量监督样本和高维即时状态反馈时运动精度低、泛化能力弱等问题,此外高维的控制空间也使得在进行物体操作等复杂任务时肌肉控制信号求解困难。而在人类和动物的运动控制方面,神经科学家发现了大脑运动皮层运动准备和肌肉协同等神经机制,可以为提高肌肉骨骼机器人运动及操作等方面的技能提供新思路。因此,本文针对上述问题,受大脑运动皮层运动准备和肌肉协同等神经机制的启发,提出了一系列生物启发式运动学习方法以实现肌肉骨骼机器人的精准控制。具体而言,本文主要内容和贡献如下:

(1) 针对在缺乏即时状态反馈和稠密奖励信号的情况下,肌肉骨骼机器人运动学习中存在的运动精度低和泛化能力弱的问题,受大脑运动皮层动态系统理论和运动准备机制的启发,提出了一种基于受初始状态调控的循环神经网络的肌肉骨骼机器人目标导向运动学习方法。首先,设计了基于准备网络和执行网络的运动学习架构,在该架构中基于循环神经网络的准备网络和执行网络分别生成执行网络初始状态和时变肌肉控制信号。进而提出了一种基于初始状态隐空间的新运动目标快速泛化方法,利用学习到的少量运动对应的初始状态构造低维隐空间,结合演化算法实现新运动对应的初始状态的快速求解。所提出的方法在肌肉骨骼系统仿真实验上进行了验证,实验证明该方法无需即时的状态反馈和稠密奖励信号,即可控制肌肉骨骼系统高精度地实现不同目标导向运动的学习,且仅需少量训练便可精确泛化至新运动目标。

(2) 针对上述工作面临未学习过的目标导向运动任务时仍需少量训练的局限性,受大脑运动皮层运动准备机制及运动基元理论的启发,进一步提出了一种可直接泛化至新运动目标的肌肉骨骼机器人运动学习方法。首先,提出了一种基于循环神经网络初始状态的运动基元,降低了求解不同运动对应的初始状态的难度。其次构建了一种初始状态基元的调控模型,通过基元的调控和组合计算不同运动时循环神经网络的初始状态。进而针对初始状态基元的调控模型,设计了一种基于近端策略优化的强化学习算法,实现了运动学习和对新运动的直接泛化。所提出的方法在肌肉骨骼系统仿真实验上进行了验证,实验证明该方法在无需即时状态反馈和稠密奖励信号的情况下有效提高了学习效率,并且可以直接泛化至未学习过的新运动目标,有较强的泛化能力。

(3) 前两个工作在没有即时状态反馈及稠密奖励信号情况下开展研究,但是若要实现更复杂的运动任务,仍需要状态反馈信息。而在获得状态反馈时,传感器类型的限制通常导致肌肉骨骼机器人状态信息部分可观测,这阻碍了运动精度及学习效率等性能的提高。针对该问题并发挥肌肉骨骼机器人本体优势,受注意力及肌肉协同机制的启发,本文进而提出了一种基于记忆-注意力且嵌入肌肉协同知识的肌肉骨骼机器人强化学习方法。首先提出了融合记忆和注意力的状态修正模块,结合双延迟深度确定性策略梯度算法,推断更充分和有效的状态。此外引入了肌肉骨骼机器人先验知识,将肌肉协同嵌入策略网络最后一层的隐藏层。所提出的方法在肌肉骨骼系统仿真实验上进行了验证,实验证明所提出的方法有效解决了肌肉骨骼机器人在部分状态可观测场景下应用经典强化学习方法时的控制瓶颈问题,并且当肌肉骨骼机器人的肌肉发生变化时该算法可以快速适应改变后的状态,增强了适应能力。

(4) 前三个工作围绕肌肉骨骼机器人目标导向运动开展研究,考虑到肌肉骨骼机器人灵巧操作的潜在优势,本文最后针对复杂肌肉骨骼机器人的手内操作开展研究。该研究面临着肌肉控制空间更加庞大且任务更加复杂等挑战,受肌肉协同机制的启发,提出了手内操作两阶段双策略学习方法。首先提出了一种模型预测控制框架下的双策略学习算法,基于学习进展即时调整两种策略生成控制信号的概率,并引入基于好奇心的内在奖励,增强对高维控制空间的探索。其次,提出了引入肌肉协同的灵巧操作两阶段训练方法,在预训练阶段(第一阶段)通过学习简单任务提取肌肉协同,在复杂任务训练阶段(第二阶段)通过在策略空间嵌入提取到的肌肉协同进行复杂任务的学习,从而有效引导第二阶段中肌肉控制信号的生成。具有大量冗余肌肉及关节的肌肉骨骼系统手内操作仿真实验及对比实验证明,该方法能有效解决经典强化学习方法无法解决的肌肉骨骼机器人手内操作任务的学习,有效提升了手内操作的成功率和控制精确度。

  总体而言,本文借鉴生物体关于肌肉骨骼系统控制的神经机制,受大脑运动准备和肌肉协同等神经机制的启发,提出了一系列生物启发式运动学习方法,提高了肌肉骨骼机器人在缺乏大量监督样本和高维即时状态反馈时的运动精度、学习效率和泛化能力,还进一步实现了复杂肌肉骨骼机器人系统的手内操作任务的学习。此外,本文的研究内容对于机器人、人工智能和神经科学等多学科知识的交叉融合具有正面意义。

英文摘要

  The traditional articulated robotic technology is increasingly mature. It plays an important role in industrial manufacturing and other fields due to its high precision and stability advantages, which can replace humans in accomplishing tasks with high repetition, great danger, and overweight loads. With the development of robotics technology and society, people’s expectations for robots gradually increase. They expect robots to ensure safety while interacting better with humans, to adapt to unstructured dynamic environments, and to achieve dexterous operations like humans. However, traditional robots still have certain limitations in these aspects. In contrast, the human musculoskeletal system, controlled by the central nervous system, demonstrates outstanding motion and manipulation abilities in complex and changing environments. Musculoskeletal robots simulate the human skeletal and joint structures, and they also adopt redundant muscles with human-like muscle characteristics and arrangement patterns for actuation. Therefore, they have potential advantages such as flexibility, safety, dexterity, and adaptability similar to the human musculoskeletal system, offering a possible solution to overcome the current limitations of traditional robots. However, the muscle arrangement patterns and dynamic characteristics also introduce high redundancy, strong coupling, and strong nonlinearity, making achieving stable and precise control of such systems challenging. Existing control methods for musculoskeletal robots still face problems such as low motion accuracy and weak generalization ability when lacking many supervised samples and high-dimensional real-time feedback. The high-dimensional control space also makes solving muscle control signals difficult for complex tasks such as object manipulation. In terms of human and animal motion control, neuroscientists have discovered neural mechanisms such as motor preparation in the brain’s motor cortex and muscle synergies, which can provide new insights for improving skills in motion and manipulation of musculoskeletal robots. Therefore, to address the aforementioned problems, inspired by neural mechanisms such as motor preparation in the brain’s motor cortex and muscle synergies, this thesis proposes a series of biologically inspired motion learning methods to achieve precise control of musculoskeletal robots. Specifically, the main contents and contributions of this thesis are as follows:

(1) The first work aims to improve movement accuracy and generalization when there is a lack of immediate feedback of states and dense rewards. Inspired by dynamic system theory and motor preparation mechanisms of the brain’s motor cortex, a method for goal-directed motion learning of musculoskeletal robots based on recurrent neural networks modulated by initial state is proposed. Firstly, a motion learning architecture based on preparation and execution networks is designed, in which the preparation network and execution network based on recurrent neural networks respectively generate the initial state of the execution network and time-varying muscle control signals. Then, a rapid generalization method for unlearned motion targets is proposed based on the latent space of initial states . It utilizes a small number of learned initial states corresponding to motions to construct a low-dimensional latent space, combined with an evolutionary algorithm to quickly solve the initial states corresponding to new motions. The proposed method is validated through simulations of a musculoskeletal system, demonstrating that it can achieve high-precision learning of various goal-directed motions of the musculoskeletal systemwithoutrequiring immediatestate feedbackordense rewards. Furthermore, it requires only a small amount of training to precisely generalize to new motion targets.

(2) To address the limitation of still requiring a small amount of training when facing unlearned goal-directed motion tasks in the first work, inspired by the motor preparation mechanisms of the brain’s motor cortex and the theory of motor primitives, a method for motion learning of musculoskeletal robots that can directly generalize to new motion targets is proposed. Firstly, a motion primitive based on the initial state of recurrent neural networks is designed to reduce the difficulty of solving initial states corresponding to different motions. Secondly, a modulation model of initial state primitives is constructed to compute the initial state of recurrent neural networks for different motions through the regulation and combination of primitives. Subsequently, for the modulation model of initial state primitives, a reinforcement learning algorithm based on proximal policy optimization is designed to achieve motion learning and direct generalization to new motions. The proposed method is validated through simulations of a musculoskeletal system, demonstrating that it effectively improves learning efficiency without requiring immediate state feedback or dense rewards. Furthermore, it can directly generalize to unlearned new motion targets, exhibiting strong generalization capability.

(3) Thefirst two works conducted research without immediate state feedback and dense rewards. However, to achieve more complex motion tasks, state feedback is still required. When obtaining state feedback, limitations in sensor types often result in status information being partially observable, hindering the improvement of performance in motion accuracy and learning efficiency. To address this problem and leverage the inherent advantages of musculoskeletal robots, inspired by attention and muscle synergy mechanisms, the third work proposes a memory and attention based reinforcement learning method for musculoskeletal robots with prior knowledge of muscle synergies. Firstly, a state modulation module that integrates memory and attention is proposed, combined with the twin delayed deep deterministic policy gradient algorithm, to infer more comprehensive and effective states. Additionally, prior knowledge of musculoskeletal robots is introduced, embedding muscle synergy into the last hidden layer of the policy network. The proposed method is validated through simulations of a musculoskeletal system, demonstrating its effectiveness in resolving the control bottleneck problem encountered by musculoskeletal robots when applying classical reinforcement learning methods in scenarios where only partial states are observable. Moreover, when the muscles of musculoskeletal robots change, the algorithm can quickly adapt to the altered states, enhancing its adaptability.

(4) Considering the potential advantages of musculoskeletal robots in dexterous operation tasks, the fourth work focuses on the in-hand manipulation of complex musculoskeletal robots. This research is faced with the challenge of larger muscle control space and more complex tasks. Inspired by the muscle synergy mechanism, a two-stage dual-policy learning method for dexterous in-hand manipulation is proposed. Firstly, a dual-policy learning algorithm is proposed based on a model predictive control framework. Based onthelearning progress, the probability of generating control signals from the two policies is adjusted immediately, and the intrinsic reward based on curiosity is introduced to enhance the exploration of high-dimensional control space. Secondly, a two-stage training method for dexterous manipulation with muscle synergy is proposed. In the pre-training stage (the first stage), muscle synergy is extracted by learning simple tasks, and in the training stage of complex tasks (the second stage), the extracted muscle synergy is embedded in the policy space to learn complex tasks, thus effectively guiding the generation of muscle control signals in the second stage. The simulation and comparison experiments of the musculoskeletal system with many redundant muscles and joints prove that this method can effectively solve the skill learning of the musculoskeletal robot in operation tasks that traditional reinforcement learning methods cannot solve. The proposed method effectively improves the success rate and control accuracy of the operation.

  In a summary, inspired by the biological neural mechanism of musculoskeletal system control such as brain motor preparation and muscle synergy, this thesis designs a series of motion learning methods to improve the accuracy, efficiency and generalization of motion learning of musculoskeletal robots without large amounts of supervised samples and high-dimensional real-time state feedback. It also realizes the learning of in-hand manipulations of the complex musculoskeletal robotic system. In addition, the research content of this thesis has positive significance for the integration of multidisciplinary knowledge such as robotics, artificial intelligence and neuroscience.

 

关键词肌肉骨骼机器人,生物启发式运动学习,运动准备,肌肉协同词
学科领域电子、通信与自动控制技术
学科门类工学
语种中文
是否为代表性论文
七大方向——子方向分类智能机器人
国重实验室规划方向分类受人机理启发的类脑控制和类肌肉骨骼系统理论
是否有论文关联数据集需要存交
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/57221
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
王萧娜. 受大脑运动准备及肌肉协同机制启发的肌肉骨骼机器人运动学习研究[D],2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
论文系统提交_授权版.pdf(11669KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王萧娜]的文章
百度学术
百度学术中相似的文章
[王萧娜]的文章
必应学术
必应学术中相似的文章
[王萧娜]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。