基于多层感知机的三维人手实时多目重建方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于多层感知机的三维人手实时多目重建方法研究
	杨健
	2024-05-16
页数	80
学位类型	硕士
中文摘要	通过视觉系统对人体运动以及用户意图进行感知是虚拟场景下智能人机交互的主要方式。其中，基于视觉的三维人手重建方法旨在通过相机系统实时重建出用户人手的几何外形。利用该项技术能够实现虚拟三维物体与用户人手的精确碰撞交互，提升用户体验的真实感。区别于先前的三维人手重建方法，本研究致力于通过多目相机提供的立体视觉信息缓解深度模糊性问题和人手的自遮挡问题，以实现高精度的三维人手重建，并通过高效的人手三维建模方法来提高多目人手重建算法的效率进而实现实时推理。本文围绕人手的高效建模和实时的多目三维人手重建方法两个方面进行研究：（1）人手的高效建模。本文推广了隐式神经几何表示的一般框架，提出了一种多层感知机架构的人手几何显式建模方法。该方法使用分而治之的策略，将高度非凸的人手几何分解为一系列凸组件，实现一种几何解耦的逐骨骼重建。而对于每个局部几何，本文又提出了一种三轴建模的思路，将隐式神经几何表示的单值回归框架推广为显式点云的逐坐标多值回归框架。利用该框架本文实现了局部点云几何的高效建模。在多个数据集上进行的实验证明了本文提出的三维人手点云建模方法的高效性。具体来说，相比基于隐式几何表示的人手建模方法，本文的三维人手点云建模方法同样具有亚毫米级别的精度，但是在推理速度上实现了三个数量级的提升。而相比于基于学习的显式网格建模方法，本文的方法在精度上和推理损耗上均实现了大幅度的领先，单次推理仅需5 毫秒且建模精度达到了亚毫米级别。（2）实时多目三维人手重建。本文进一步推广了所提出的三维人手点云表示，提出了一种几何上更稠密、网络结构更轻量的三维人手网格表示。然后基于该三维人手网格表示方法，提出了一个实时的多目人手重建方法。该多目人手重建方法首先从多目的视觉信息中进行三维人手骨架的估计和视觉特征提取，然后将视觉信息以一种无噪声的信息注入方式传输给训练好的三维人手网格表示模型中，实现了基于视觉信息的增强预测。在多个公开数据集上的实验，证明了本文所提方法的高效性。相比于最先进的多目人手重建方法，本文提出的方法在精度上相当，但是在推理速度上实现了两倍的提升，达到每秒60FPS 的推理速度。这证明了本文方法的合理性和高效性。
英文摘要	Perceiving human motion and user intent through the visual system is the primary means of intelligent human-computer interaction in virtual scenarios. Among these, the visual-based three-dimensional (3D) hand modeling method aims to real-time reconstruct the geometric shape of user hands using camera systems. Utilizing this technology enables precise collision interaction between virtual 3D objects and user hands, enhancing the realism of user experience. Distinguished from previous methods of 3D hand reconstruction, this research focuses on alleviating depth ambiguity and self-occlusion issues of hands by leveraging stereo vision information provided by multi-camera systems to achieve high-precision 3D hand reconstruction. Through efficient 3D hand modeling and real-time multi-camera 3D hand reconstruction, this study advances research in two aspects: (1) Efficient hand modeling: This research extends the general framework of implicit neural geometric representation and proposes a Multi-Layer Perceptron (MLP) architecture-based method for explicit geometric modeling of hands. This method employs a divide-and-conquer strategy to decompose the highly non-convex hand geometry into a series of convex components, achieving a geometric decoupling for boneby- bone reconstruction. Furthermore, a three-axis modeling approach is proposed for each local geometry, extending the single-value regression framework of implicit neural geometric representation to a multi-value regression framework for explicit point cloud representation. Utilizing this framework, this research achieves efficient modeling of local point cloud geometry. Experimental results on multiple datasets demonstrate the effectiveness of the proposed 3D hand point cloud modeling method. Specifically, compared to hand modeling methods based on implicit geometric representation, the proposed method achieves sub-millimeter precision but improves inference speed by nearly a thousand times. Compared to learning-based explicit mesh modeling methods, the proposed method outperforms significantly in both accuracy and inference overhead, with single-inference taking only 5.6 milliseconds and modeling precision reaching submillimeter level. (2) Real-time multi-view 3D hand reconstruction: This research further extends the proposed 3D hand point cloud representation and proposes a more geometrically dense and lightweight 3D hand mesh representation. Based on this representation, a realtime multi-view hand reconstruction method is proposed. This method first estimates the 3D hand skeleton and extracts visual features from multi-view visual information, then injects visual information into the trained 3D hand mesh representation model in a noise-free manner, achieving enhanced prediction based on visual information. Experiments on multiple public datasets demonstrate the efficiency of the proposed method. Compared to the state-of-the-art multi-view hand reconstruction baseline, the proposed method achieves comparable accuracy but doubles the inference speed, reaching a inference speed of 60 FPS. This demonstrates the rationality and efficiency of the proposed method.
关键词	多层感知机，人手几何建模，三维人手重建，实时多目重建
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/57065
专题	毕业生_硕士学位论文毕业生
推荐引用方式 GB/T 7714	杨健. 基于多层感知机的三维人手实时多目重建方法研究[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
毕业论文yj_final_v2.pdf（8531KB）	学位论文		限制开放	CC BY-NC-SA