基于卷积神经网络的猴类动物行为识别

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于卷积神经网络的猴类动物行为识别
	孙峥
	2022-05-21
页数	64
学位类型	硕士
中文摘要	在临床前药物安全评价中，猴类动物是必不可少的实验动物，然而长时间的人为观察在成本和随机性方面都有不可忽视的缺陷。因此，需要研发可行的人工智能方法对猴类动物表现出来的与药物安全评价相关的行为进行实时和定量分析。分析猴类动物行为的一个重要技术路径就是利用其姿态信息，特别是基于动物身体关键点的姿态表示。目前人体姿态估计和行为识别已经得到了广泛发展，然而在猴类动物上相关方法的研发却发展缓慢。因此，使用人工智能方法自动识别猴类动物的姿态和行为对临床前药物安全评价具有重要的现实意义和应用前景。本文主要对基于卷积神经网络的猴类动物姿态估计与行为识别方法展开研究： 1.建立了猴类动物姿态估计和行为识别数据集。大规模数据集在人工智能的发展过程中发挥着重要作用，由于缺少完善的猴类动物数据集，人工智能方法在猴类动物应用中发展缓慢。针对临床前药物安全评价场景下猴类动物数据集规模小、丰富度低等问题，本文建立了猴类动物姿态估计和行为识别数据集，数据集中的原始数据是在实际临床前药物安全评价场景下采集的。相机使用特制的装置进行固定和保护，并对猴类动物日常生活的视频数据进行拍摄采集。获取到的视频数据经过手动筛选、视频帧预处理、关键点坐标和行为类别标注以及标签文件生成等流程，形成适用于猴类动物姿态估计和行为识别的数据集。 2.提出了一种基于目标区域注意力机制的姿态估计方法。在基于深度学习的动物姿态估计方法中，大多数的工作将人体姿态估计任务中的数据集和模型直接应用到动物场景中，缺乏对应用场景特有问题的分析。在临床前药物安全评价场景下，猴类动物姿态估计任务主要存在以下难点：猴类动物的关键点被毛发遮挡且关键点之间非常相似；猴类动物躯体柔软，形成的姿态比人体更复杂；猴类动物运动较快，采集的部分视频中局部区域出现模糊情况。针对上述问题，本文提出了基于目标区域注意力机制的姿态估计方法。首先利用前景目标区域位置的先验信息，训练一个辅助的卷积神经网络用于生成目标区域的注意力特征图。然后利用姿态信息训练主干卷积神经网络，在训练过程中融合辅助网络生成的目标区域注意力特征图。实验表明，本文提出的基于目标区域注意力机制的方法有助于卷积神经网络模型定位并区分不同的关键点，从而进一步生成更加准确的目标姿态信息。 3.提出了一种基于全局时空编码器的骨架行为识别方法。临床前药物安全评价任务中猴类动物所处的场景单一，背景扰动、光照变化以及外观差异较小，导致连续的视频帧和光流图中会包含冗余信息。基于骨架时序信息的行为识别方法关注目标的肢体动作，丢弃了外观和背景中的冗余信息，降低了数据对模型参数量的要求。然而，现有的一些骨架行为识别方法通常使用卷积层来提取空间维度和时间维度的局部特征，忽略了行为的整体性。本文提出一种基于全局时空编码器的骨架行为识别方法，在卷积神经网络的基础上融合时空维度的全局特征和局部特征。实验表明，全局时空编码器在基本不增加模型参数量的情况下，可以显著提高猴类动物行为识别准确率，有助于提升模型的鲁棒性。总体来说，本文工作从实际临床前药物安全评价场景出发，使用深度学习方法对猴类动物姿态估计和行为识别任务进行了进一步研究，对人工智能方法在药物安全评价中的应用进行了积极的探索。
英文摘要	Monkeys are essential animals in preclinical drug safety assessment, and there are non-negligible drawbacks in the process of long-term observation for human in terms of cost and randomness. So, it is meaningful to develop feasible artificial intelligence methods for real-time and quantitative analysis of behaviors exhibited by monkeys in drug safety assessment. Pose information is important to analyze the behaviors of monkeys, especially the posture representation based on skeleton points. At present, human pose estimation and action recognition have been widely developed, but the related methods on monkeys are rare. Therefore, the application of artificial intelligence methods to automatically identify the postures and behaviors of monkeys is of important practical significance and application prospects for preclinical drug safety assessment. We mainly study monkey pose estimation and action recognition methods based on convolutional neural network: 1.Datasets of monkey pose estimation and action recognition. Large-scale datasets play an important role in the development of artificial intelligence, and the related methods are rare on monkeys due to the lack of complete monkey datasets. Aiming at the small scale and low abundance of monkey datasets in preclinical drug safety assessment, we establish a monkey pose estimation dataset and a monkey action recognition dataset. The raw data in the datasets are collected in the actual preclinical drug safety assessment scenario. The camera is protected and fixed with a special device to capture the video data of monkeys’ daily life. The acquired video data undergo manual screening, video frame preprocessing, skeleton coordinates and action categories annotating, and labels generating, to form datasets suitable for monkey pose estimation and action recognition. 2.A pose estimation method based on target region attention mechanism. Among the animal pose estimation methods based on deep learning, most works apply the datasets or models in the human pose estimation task to animal scenarios directly, which lack the analysis of problems specific to the application scenario. In the preclinical drug safety assessment, there are following difficulties on the monkey pose estimation task. Firstly, the skeleton points of monkeys are occluded by the fur and they are very similar to each other. Secondly, the body of monkey is soft and presents more complex gestures than those of the human. Besides, monkeys move fast, and some of the collected videos are blurred in local areas. Aiming at the above problems, we propose a pose estimation method based on the target region attention mechanism. Firstly, an auxiliary convolutional neural network is trained to generate the attention feature maps of the target areas by using the prior information of the foreground area masks. Then, the backbone convolutional neural network is trained using the pose information, and the target region attention feature maps generated by the auxiliary network are fused during the training process. Experiments show that the method based on the target area attention mechanism enables the model to locate and distinguish different skeleton points, further generating more accurate target pose information. 3.A skeleton action recognition method based on global spatial temporal encoder. In the preclinical drug safety assessment task, the monkey is in a single cage. The background disturbance, illumination change and appearance variation are negligible, resulting in redundant information in consecutive video frames and optical flow images. The action recognition methods based on the skeleton temporal information focus on the skeleton movements of the targets, and discard the redundant information in the appearance and background. The skeleton-based methods also reduce the requirements for parameters of the model. However, existing skeleton-based action recognition methods usually use convolutional layers to extract local features in both spatial and temporal dimensions, which ignore the integrity of the action. We propose a skeleton action recognition method based on global spatial temporal encoder, which fuses the global features and local features in the spatiotemporal dimension on the basis of convolutional neural network. Experiments show that the global spatial temporal encoder improve the recognition accuracy significantly without increasing the amount of model parameters, which is helpful to improve the robustness of the model. In general, we start from the actual preclinical drug safety assessment scenario, and utilize deep learning methods to further focus on monkey pose estimation and action recognition tasks, which actively explores the application of artificial intelligence methods in preclinical drug safety assessment.
关键词	行为识别姿态估计卷积神经网络猴类动物临床前药物安全评价
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48503
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	孙峥. 基于卷积神经网络的猴类动物行为识别[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（19917KB）	学位论文		限制开放	CC BY-NC-SA