多机器人编队协同路径规划方法研究

CASIA OpenIR > 综合信息系统研究中心

多机器人编队协同路径规划方法研究

眭泽智

2020-05

页数

176

学位类型

博士

中文摘要

近年来，各类机器人在军事和民用等各个领域得到了广泛的应用，并显现出巨大的应用价值。随着任务场景越来越多样且作用环境越来越复杂，多机器人编队由于在作用范围、安全保障、任务效率等方面的优势成为了机器人领域的研究热点。作为多机器人编队中诸多关键技术之一，编队协同规划与控制受到了诸多领域科研人员的持续关注。然而，现有大多数研究聚焦于多机器人编队的协同控制与队形保持，虽然有一些学者在多机器人编队协同路径规划领域取得了一定的成果，但仍存在一些典型场景与问题尚未解决。此外，近年来兴起的深度强化学习方法对多机器人编队的协同路径规划问题提供了新的解决思路，但目前相关的研究工作较少。为此，本文就多机器人编队协同中的路径规划问题进行深入研究，针对编队中的队形生成与变换、队形保持与协同避碰两类典型场景，进行问题建模与求解，论文的主要工作和创新点归纳如下：

(1)针对多机器人编队的队形生成与变换问题，提出了一种基于粒子群优化算法与匈牙利算法的最优变换策略。该算法通过内环求解编队中若干个体的匹配关系，外环优化队形间的最优偏移量，内外环共同作用实现了全局总路径最短且无碰撞的期望路径生成。在此基础上，针对非质点模型，设计了基于受限人工势场法的多机器人协同路径规划算法，实现了编队整体安全无碰撞的轨迹规划。

(2)针对多机器人编队队形保持与协同避碰问题，研究了基于深度强化学习的编队路径规划方法。对于图像类的数据输入，使用了并行双Q网络结构，设计了一种协同奖励机制，实现了编队内多智能体的协同规划，完成了多机器人受限编队的任务。对于实体状态类的数据输入，提出了基于深度强化学习的多机器人编队队形保持与协同避碰算法，将问题建模为基于复合奖惩函数的马尔科夫决策过程，并通过深度价值网络对机器人行为策略进行训练，实现了动态环境中的队形保持与协同避碰，所提算法相比于现有的方法在成功率与安全保障上都有明显提升。

(3)针对无模型强化学习方法在训练过程中收敛缓慢、探索效率低的问题，研究了基于模型知识和数据训练融合的队形保持与协同避碰方法。提出了基于模型数据引导的队形保持与协同避碰方法，以基于一致性理论与多智能体协同避碰方法的切换系统为示教者，并在强化学习前对该系统进行模仿学习，从而获得有效的初始策略，以提高后续训练效率。此外，设计了基于动态障碍物概念的动作空间过滤器，改善了强化学习中无用动作探索的问题，从安全性以及训练效率两方面提升了原有方法的性能。最后通过对比实验验证了所提方法的有效性与优越性。

(4)针对大规模多机器人编队试验难、条件要求苛刻等实际问题，搭建了面向多机器人编队典型场景的软件在环多无人机仿真系统与地面无人车集群系统，实现了算法的快速演示与验证。在无人车集群系统中，设计了跨平台的规划与控制体系，实现了集群的任意通信组网，并设计了基于超宽带室内定位的虚拟GPS方法。通过所搭建平台，验证了本文所提算法的有效性。

总体而言，本文从队形生成与变换以及队形保持与协同避碰两类典型场景出发，深入研究了队形最优变化策略与基于深度强化学习的队形保持与协同避碰方法及其优化。在此基础上，搭建了面向上述场景的软件在环仿真系统与硬件平台并对所提方法进行了验证，取得了具有重要理论和实际应用价值的研究成果。

英文摘要

In recent years, various robots have been widely used in military and civil fields and have shown great application value. With the increase in complexity of the operating environment and diversity in mission scenarios, cooperative formation of multi-robot has become a research hotspot in the robotics field due to its advantages in terms of range, safety, and efficiency, etc. As one of the essential technologies in multi-robot formation, formation cooperative planning and control have attracted continuous attention from researchers in many fields. However, most existing researches focus on cooperative control and formation maintenance of multi-robot formation. Although some researchers have achieved certain results in the field of formation cooperative path planning, there are still some typical scenarios and problems that have not yet been solved. Moreover, deep reinforcement learning (DRL) methods, which have emerged in recent years, provide an alternative scheme for the problem of cooperative path planning of multi-robot formation yet there is currently little related research work. Therefore, the path planning problem of multi-robot cooperative formation is studied in this dissertation. Aiming to achieve stable cooperative formation of two typical scenarios, the problem of formation generation and transformation, formation maintenance and cooperative collision avoidance are modeled and solved. The main work and innovation points of this dissertation are summarized as follows:

(1)To deal with the problem of formation generation and transformation for multi-robot system, a novel optimal transformation strategy is proposed based on particle swarm optimization (PSO) algorithm and Hungarian algorithm. Such strategy uses the inner loop to solve the matching relationship of individuals in the formation, the outer loop to optimize the offset between formations. It realizes the generation of expected shortest global collision free path by the joint action of the inner and outer loops. On this basis, a formation path planning algorithm based on limited artificial potential field method is designed to realize safe and collision free path planning for non-particle model.

(2)To deal with the problem of multi-robot formation maintenance and cooperative collision avoidance, deep reinforcement learning based path planning methods are studied. For image input data, a parallel double-Q network structure is implemented, and a cooperative reward mechanism is designed to realize the cooperative path planning of multiagent and complete the task of constrained formation maintenance. For input data of entity state, a DRL based formation maintenance and cooperative collision avoidance method is proposed, where the problem is modeled as a comprehensive reward function based Markov decision process (MDP). The behavior policy of robot is trained through a deep value network to achieve formation maintenance and cooperative collision avoidance in a dynamic environment. Compared with the existing methods, the proposed method has shown significant improvement in success rate and safety.

(3)Considering the problems of slow convergence and low exploration efficiency in the training process of model-free reinforcement learning (RL) methods, formation maintenance and cooperative collision avoidance methods based on model knowledge and data training fusion are studied. A model-guided formation maintenance and cooperative collision avoidance method method is proposed in this dissertation, where a switching system based on consensus theory and multi-agent cooperative collision avoidance method is designed, and the system is used as a demonstrator for imitation learning before RL, so as to obtain effective initial strategy and improve the efficiency of subsequent training. Besides, an action space filter based on the concept of velocity obstacle (VO) is designed, which improves the problem of useless action exploration in reinforcement learning, and improves the performance of the original method in terms of safety and training efficiency. Finally, the effectiveness of the proposed methods is verified by comparative experiments.

(4)Aiming at solving the practical problems of large-scale multi-robot formation such as the difficulty of test and the harsh requirements of conditions, a software-in-the-loop (SITL) simulation platform and a ground unmanned vehicle swarm platform for typical scenarios of multi-robot formation are built to achieve rapid demonstration and verification of the algorithm. In the unmanned vehicle swarm platform, a cross-platform planning and control system is designed to realize any communication network of the swarm, and a virtual Global Positioning System (GPS) method based on ultra-wideband (UWB) indoor positioning is designed. Finally, the effectiveness of the proposed algorithms in this dissertation are verified by those platforms.

In general, starting from two typical scenarios of formation generation and transformation, formation maintenance and cooperative collision avoidance, this dissertation deeply studies the optimal formation transformation strategy, DRL based formation maintenance and cooperative collision avoidance methods and their optimization. On this basis, a SITL simulation system and a hardware platform for the above scenarios are built. The research results obtained have great theoretical and practical application value.

关键词

多机器人编队协同路径规划队形变换队形保持与协同避碰深度强化学习

语种

中文

七大方向——子方向分类

多智能体系统

文献类型

学位论文

条目标识符

http://ir.ia.ac.cn/handle/173211/39694

专题

综合信息系统研究中心

推荐引用方式
GB/T 7714

眭泽智. 多机器人编队协同路径规划方法研究[D]. 北京. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
多机器人编队协同路径规划方法研究-上传版（14824KB）	学位论文		开放获取	CC BY-NC-SA