基于不确定度估计的推荐系统数据去偏

CASIA OpenIR > 模式识别实验室

	基于不确定度估计的推荐系统数据去偏
	粟晨阳
	2023-05
页数	62
学位类型	硕士
中文摘要	推荐系统在互联网领域扮演着越来越重要的角色，因为它们可以根据用户的兴趣和需求推荐适当的内容、商品和服务。然而，在实际应用中，推荐系统往往受到各种偏见的影响，导致推荐结果不尽人意。因此，在推荐系统领域，去偏见研究具有重要意义。本文分析了现有的推荐系统去偏见方法，发现现有的方法效果受到误差插补值的误差和逆概率加权得分方差的影响。因此，我们提出了一种基于不确定度估计的推荐系统去偏见框架（UEB），通过降低误差插补值的误差，并优化预测模型和插补模型的损失函数，最终提高推荐系统的性能表现。然而，UEB 引入了不确定度估计模块，导致了计算开销提高，训练时间增加，降低了实际情景的可用性。因此，本文还提出在推荐系统中采用神经随机微分方程模型（SDE）来对推荐系统数据进行预处理，从而降低 UEB 的计算开销。 UEB 包含预测模型与误差插补模型，在误差插补模型中引入不确定度估计模块，通过对误差插补模型输出的伪标签进行不确定度估计，来实现降低误差插补值的误差，最终降低预测结果的偏差。为了验证 UEB 的有效性，本文在 Yahoo! R3 和 Coat 数据集上进行了一系列实验，并将结果与现有方法在多种指标下进行了比较，实验结果证明了 UEB 的有效性。此外，还通过不同的不确定度估计实现与不同的参数设定进行了一系列实验，验证了 UEB 的稳定性。 UEB 引入了不确定度估计模块，提高了计算成本。为降低计算成本，本文提出将 SDE 应用在推荐系统对数据进行预处理。神经网络，尤其是残差网络（ResNet），与动态系统密切相关，使得神经网络层之间的关系可以用常微分方程来描述。通过引入扰动项，常微分方程可以改写为能够测量不确定度的 SDE。 SDE 由两个部分组成：（1）用于控制动态系统以适应预测函数的漂移网络；（2）用于捕捉不确定度的扩散网络。SDE 能够在只训练一次的情况下输出不同的结果，通过对漂移网络多次采样，利用漂移网络的输出来衡量置信度，并以此来进行不确定度估计并对数据进行预处理。本文在 Yahoo! R3 和 Coat 数据集进行了一系列实验，并与现有方法的运行时间、模型参数量和运算符点数（MFLOPS）进行了比较。结果显示 SDE 能够一定程度降低计算时间和计算复杂度。同时我们也在推荐任务里对比了 SDE 和现有方法的推荐性能，证明了 SDE 能够降低计算代价的同时保持优秀的推荐性能。综上所述，本文主要贡献有两点：第一点，现有的去偏方法饱受误差插补值的高误差和逆概率加权得分高方差的困扰，基于此我们提出了 UEB，通过实验证明，UEB 能有效且稳定的提高推荐系统的性能表现。第二点，由于 UEB 引入了不确定度估计模块，提高了计算成本。本文提出将 SDE 应用在推荐系统对数据进行预处理，实验表明该方法能够降低计算代价的同时保持优秀的推荐性能。
英文摘要	Recommender systems play an increasingly important role in the Internet domain, as they can recommend appropriate content, goods and services based on users’ interests and needs. However, in practical applications, recommendation systems are often influenced by various biases, resulting in unsatisfactory recommendation results. Therefore, de-biasing research is of great importance in the field of recommendation systems. In this paper, we analyze the existing recommendation system de-biasing methods and find that the effect of existing methods is affected by the inaccuracy of imputed error or the variance of the inverse propensity score. Therefore, we propose an uncertainty estimation-based recommendation system debiasing framework (UEB), which ultimately improves the performance performance of recommendation systems by reducing the inaccuracy of imputed error and optimizing the loss functions of prediction and imputation models. However, the UEB introduces an uncertainty estimation module, which leads to higher computational overhead and increased training time, reducing the usability of real-world scenarios. Therefore, this paper also proposes to use neural stochastic differential equation model (SDE) in the recommender system to preprocess the recommender system dataset, thus reducing the computational overhead of UEB. UEB contains a prediction model and an imputation model, and introduces an uncertainty estimation module in the imputation model to achieve a reduction in the inaccuracy of imputed error by estimating the uncertainty of the pseudo-labels output from the imputation model, and finally reduce the bias of the prediction results. In order to verify the effectiveness of UEB, a series of experiments are conducted on Yahoo! R3 and Coat datasets in this paper, and the results are compared with existing methods under various metrics. In addition, a series of experiments were conducted with different uncertainty estimation implementations with different parameter settings to verify the stability of UEB. The UEB introduces an uncertainty estimation module, which increases the computational cost. To reduce the computational cost, this paper proposes to apply SDE to the recommendation system to pre-process the data. Neural networks, especially residual networks (ResNet), are closely related to dynamic systems, so that the relationship between neural network layers can be described by ordinary differential equations. By introducing a perturbation term, the ordinary differential equation can be rewritten as an SDE capable of measuring uncertainty. The SDE consists of two components: (1) a drift network to control the dynamic system to fit the prediction function; (2) a diffusion network to capture the uncertainty. The SDE is able to output different results when trained only once by sampling the drift network multiple times, the output of the drift network is used to measure the confidence level, and this is used to estimate the uncertainty and preprocess the data. In this paper, we conducted a series of experiments on the Yahoo! R3 and Coat datasets and compared with existing methods in terms of running time, number of model parameters and number of MFLOPS. The results show that SDE can reduce the computation time and computation complexity to a certain extent. We also compare the recommended performance of SDE with existing methods in the recommendation task, and demonstrate that SDE can reduce the computational cost while maintaining excellent recommendation performance. In summary, the main contributions of this paper are two: The first point is that existing debiasing methods suffer from high errors in the imputed error and high variance in the inverse probability weighted score. To address this, UEB is proposed, and experimental results demonstrate its effectiveness and stability in improving recommendation system performance. The second point is that UEB introduces an uncertainty estimation module, which increases the computational cost. We propose to apply SDE to the recommendation system for data preprocessing, and experimentally show that this method can reduce the computational cost while maintaining excellent recommendation performance
关键词	推荐系统不确定度估计神经随机微分方程去偏学习
语种	中文
七大方向——子方向分类	其他
国重实验室规划方向分类	其他
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52249
专题	模式识别实验室毕业生_硕士学位论文
推荐引用方式 GB/T 7714	粟晨阳. 基于不确定度估计的推荐系统数据去偏[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
粟晨阳毕业论文.pdf（4997KB）	学位论文		开放获取	CC BY-NC-SA