面向隐私保护的深度学习研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向隐私保护的深度学习研究
	程安达
	2023-05-20
页数	128
学位类型	博士
中文摘要	深度学习模型的开发和应用离不开大规模训练数据的收集和使用。然而，这些数据中通常包含众多隐私信息，由此训练得到的深度学习模型可能存在严重的隐私泄露风险。近年来，由于用户隐私信息泄露而导致的安全问题频发，各国对数据隐私保护问题日益重视，面向隐私保护的深度学习技术研究受到学术界和工业界的广泛关注。然而在实际应用中，现有的隐私保护深度学习方法虽能在一定程度上提高系统的隐私性，但同时也会对模型精度造成灾难性破坏，严重影响了深度学习在现实场景中的应用。因此，如何在保证隐私性的情况下提高深度学习模型的精度，成为当前面向隐私保护的深度学习研究中亟待解决的一个问题。针对这一问题，本文从神经网络结构设计、优化算法设计、隐私保护机制设计、部署的模型保护等多个角度开展研究，全面提升隐私深度学习的可用性和实用性。本文的研究内容及创新点归纳如下： 1. 面向隐私保护的深度神经网络结构搜索。针对隐私保护深度学习中，由于模型结构不适应差分隐私学习算法而导致的学习效果差这一问题，本文提出一种面向隐私保护的神经网络结构搜索方法。首先，设计了一种针对隐私保护深度学习任务的搜索空间，将激活函数的选择与拓扑结构的构建作为网络结构搜索的主要目标。其次，为使搜索到的模型结构更加适应差分隐私优化算法的训练过程，提出了一种能够感知差分隐私操作的候选模型训练方法。实验结果表明，搜索得到的神经网络结构在隐私保护图像分类任务的多个数据集上取得了当前最先进的测试精度。并且，通过对搜索到的模型结构进行分析，本文为面向隐私保护的深度神经网络结构设计提供了一些新的发现和指导。 2. 基于本地正则化和稀疏化的联邦学习隐私保护。针对满足用户级差分隐私的联邦学习中模型精度差的问题，本文进行了理论分析，发现提升学习效果的关键在于，在进行差分隐私操作之前自然地降低原始本地模型更新的范数。据此，本文提出两种技术对联邦学习的本地学习过程进行改进。首先，提出有界的本地正则化技术，通过在本地学习目标中添加正则项来限制本地更新的范数不超过差分隐私的裁剪阈值。其次，提出本地更新稀疏化技术，在不影响本地模型更新效果的情况下，通过减少实际更新的参数量，进一步减小本地模型更新的范数。本文对所提方法的收敛性和隐私保护性进行了理论分析。实验结果表明，在相同的隐私保护程度下，与同期方法相比，所提方法在收敛速度和模型精度上都具有显著优势。 3. 基于自适应椭圆高斯机制的深度学习隐私保护。针对差分隐私优化算法中，由于添加各向同性的高斯噪声而导致学习效果差这一问题，本文首先提出了一种具有差分隐私保证的椭圆高斯机制。通过在高维算法的输出中添加非各向同性的高斯噪声使算法满足差分隐私性。其次，针对椭圆高斯机制应用于深度学习任务时的参数选择问题，本文提出一种基于优化的自适应参数选择方法。以最小化噪声干扰后的梯度向量与原始梯度向量之间的期望误差为目标，对椭圆高斯机制中的投影矩阵和噪声强度矩阵进行优化。并根据参数共享程度的不同，进行了三种不同实现。实验结果表明，与已有的差分隐私优化算法相比，所提出的方法能够取得更高的模型精度。 4. 面向黑盒模型隐私的生成式精度无损保护方法。针对现有的黑盒模型隐私保护技术对模型精度的损害问题，本文提出一种具有精度保持功能的生成式保护方法。已有的保护方法虽然能在一定程度上降低针对黑盒模型攻击的成功率，但会改变模型输出的预测类别，从而影响模型精度。本文提出能够保持预测类别排序的生成器模块，其输入是目标模型输出的预测分数向量，输出是扰动后的预测向量。扰动目标是最大限度地增大输出向量与输入向量之间的差异，同时保持输出分数向量的类别排序与输入向量一致，从而在保持目标模型预测类别不变的情况下，对敌手模型的训练过程造成最大程度的干扰。实验结果表明，本方法能够在完全保持目标模型精度的情况下，有效抵御模型功能窃取攻击和基于知识蒸馏的模型克隆攻击，并且防御效果显著优于同期方法。
英文摘要	Developing and applying deep learning models are inseparable from collecting and using large-scale training data. However, these data usually contain a large amount of private information. As a result, the trained deep learning model may pose a severe risk of privacy leakage. In recent years, due to the frequent security problems caused by the leakage of users’ private information, data privacy protection has been paid more and more attention by many countries. Therefore, the research on privacy-preserving deep learning technology has become a hot topic of increasing concern in both academia and industry. However, although the current privacy-preserving deep learning methods can improve privacy, they usually cause catastrophic damage to the accuracy of the deep learning model, which seriously hinders the application of the deep learning model in real scenarios. Therefore, how to improve the accuracy of deep learning models while preserving data privacy has become an urgent problem to be solved in the current research of privacy-preserving deep learning. In order to solve this problem, this thesis conducts research from the perspectives of neural network architecture design, optimization algorithm design, differential privacy mechanism design, and deployed model protection, aiming to improve the accuracy of private deep learning models from different perspectives. The research content and innovation of this thesis are summarized as follows: 1. Neural architecture search for privacy-preserving deep learning. To solve the problem of how to design neural networks that are friendly to privacy-preserving deep learning, this thesis presents an automatic method to design models for privacy-preserving deep learning based on neural architecture search. Firstly, a new search space is designed for privacy-preserving deep learning tasks. The selection of activation functions and the construction of model topology are the main objectives of the architecture search. Secondly, in order to make the resulting model architecture more suitable for the training process of the differentially private optimization algorithms, this thesis presents a novel candidate model training method that can perceive the effect of differential privacy. The experimental results show that the searched neural network architectures achieve state-of-the-art accuracy on multiple datasets of the privacy-preserving image classification tasks. Moreover, by analyzing the architectures of the search results, this thesis summarizes several discoveries and rules for designing model architectures for privacy-preserving deep learning. 2. Privacy-preserving federated learning with local regularization and sparsification. To solve the problem of model accuracy drop in federated learning that satisfies user-level differential privacy, this thesis conducts a theoretical analysis and finds that the key to improving model accuracy is to naturally reduce the original local update norm before conducting the differential privacy operations. Therefore, this thesis proposes two techniques to improve the local learning process in differentially private federated learning. First, this thesis proposes a bounded local update regularization technique, which restricts the local update norms to be smaller than the clipping threshold for differential privacy by adding a regularization term to the local learning objectiveness. Secondly, this thesis proposes local update sparse technology to further reduce the local model update norm without affecting the local model update effect. This thesis theoretically analyzes the convergence and privacy of the proposed method. The experimental results show that, for the same level of privacy, the proposed method achieves obvious advantages in both convergence speed and model accuracy, compared with existing methods. 3. Privacy-preserving deep learning with adaptive elliptical Gaussian mechanism. In order to solve the problem of performance degradation caused by the addition of isotropic Gaussian noise in differentially private deep learning, this thesis first presents an elliptical Gaussian mechanism with differential privacy guarantees. The proposed mechanism satisfies differential privacy by adding non-isotropic Gaussian noise to the output of the high-dimensional algorithms. This thesis then presents an optimization-based adaptive parameter selection method for the parameter selection problem to apply the elliptical Gaussian mechanism to privacy-preserving deep learning tasks. This method optimizes the projection matrix and noise intensity matrix in the elliptical Gaussian mechanism with the objective of minimizing the expected error between the original gradient vector and the noise-perturbed gradient vector. Then, according to the degree of parameter sharing, three different implementations of this method are carried out. The experimental results show that the proposed method achieves better model accuracy than the existing differentially private optimization methods. 4. Accuracy-preserving generative perturbation for black-box model privacy protection. To defend the black-box model against model stealing attacks, this thesis presents a generative perturbation defense method that can preserve the protected models’ accuracy. Although the existing defense methods can reduce the success rate of model stealing attacks, they will cause very serious damage to the accuracy of the protected models. This thesis presents a generator module that maintains the order of predicted categories. The input of the generator is the output score vector of the target model and the output of the generator is a perturbed score vector. The perturbation objective of the generator is to maximize the difference between the output vector and the input vector while keeping the sort of the predicted categories in the output vector consistent with that in the input vector, which can add as much perturbation as possible to the training process of the adversarial model while keeping the predicted category of the target model unchanged. The experimental results show that this method can effectively resist model stealing attacks and knowledge-distillation-based model cloning while totally maintaining the accuracy of the target model. Moreover, the defense effect is significantly better than the existing defense methods.
关键词	隐私保护，深度学习，差分隐私，网络结构搜索，联邦学习
语种	中文
七大方向——子方向分类	机器学习
国重实验室规划方向分类	智能计算与学习
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51668
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	程安达. 面向隐私保护的深度学习研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
毕业论文_面向隐私保护的深度学习研究.p（10596KB）	学位论文		限制开放	CC BY-NC-SA