CASIA OpenIR  > 模式识别实验室
基于不确定度估计的推荐系统数据去偏
粟晨阳
2023-05
页数62
学位类型硕士
中文摘要

推荐系统在互联网领域扮演着越来越重要的角色,因为它们可以根据用户 的兴趣和需求推荐适当的内容、商品和服务。然而,在实际应用中,推荐系统往 往受到各种偏见的影响,导致推荐结果不尽人意。因此,在推荐系统领域,去偏 见研究具有重要意义。本文分析了现有的推荐系统去偏见方法,发现现有的方法 效果受到误差插补值的误差和逆概率加权得分方差的影响。因此,我们提出了一 种基于不确定度估计的推荐系统去偏见框架(UEB),通过降低误差插补值的误 差,并优化预测模型和插补模型的损失函数,最终提高推荐系统的性能表现。然 而,UEB 引入了不确定度估计模块,导致了计算开销提高,训练时间增加,降 低了实际情景的可用性。因此,本文还提出在推荐系统中采用神经随机微分方程 模型(SDE)来对推荐系统数据进行预处理,从而降低 UEB 的计算开销。 UEB 包含预测模型与误差插补模型,在误差插补模型中引入不确定度估计 模块,通过对误差插补模型输出的伪标签进行不确定度估计,来实现降低误差插 补值的误差,最终降低预测结果的偏差。为了验证 UEB 的有效性,本文在 Yahoo! R3 和 Coat 数据集上进行了一系列实验,并将结果与现有方法在多种指标下进行 了比较,实验结果证明了 UEB 的有效性。此外,还通过不同的不确定度估计实 现与不同的参数设定进行了一系列实验,验证了 UEB 的稳定性。 UEB 引入了不确定度估计模块,提高了计算成本。为降低计算成本,本文 提出将 SDE 应用在推荐系统对数据进行预处理。神经网络,尤其是残差网络 (ResNet),与动态系统密切相关,使得神经网络层之间的关系可以用常微分方 程来描述。通过引入扰动项,常微分方程可以改写为能够测量不确定度的 SDE。 SDE 由两个部分组成:(1)用于控制动态系统以适应预测函数的漂移网络;(2) 用于捕捉不确定度的扩散网络。SDE 能够在只训练一次的情况下输出不同的结 果,通过对漂移网络多次采样,利用漂移网络的输出来衡量置信度,并以此来进 行不确定度估计并对数据进行预处理。本文在 Yahoo! R3 和 Coat 数据集进行了 一系列实验,并与现有方法的运行时间、模型参数量和运算符点数(MFLOPS) 进行了比较。结果显示 SDE 能够一定程度降低计算时间和计算复杂度。同时我 们也在推荐任务里对比了 SDE 和现有方法的推荐性能,证明了 SDE 能够降低计 算代价的同时保持优秀的推荐性能。 综上所述,本文主要贡献有两点:第一点,现有的去偏方法饱受误差插补值 的高误差和逆概率加权得分高方差的困扰,基于此我们提出了 UEB,通过实验 证明,UEB 能有效且稳定的提高推荐系统的性能表现。第二点,由于 UEB 引入 了不确定度估计模块,提高了计算成本。本文提出将 SDE 应用在推荐系统对数 据进行预处理,实验表明该方法能够降低计算代价的同时保持优秀的推荐性能。

英文摘要

Recommender systems play an increasingly important role in the Internet domain, as they can recommend appropriate content, goods and services based on users’ interests and needs. However, in practical applications, recommendation systems are often influenced by various biases, resulting in unsatisfactory recommendation results. Therefore, de-biasing research is of great importance in the field of recommendation systems. In this paper, we analyze the existing recommendation system de-biasing methods and find that the effect of existing methods is affected by the inaccuracy of imputed error or the variance of the inverse propensity score. Therefore, we propose an uncertainty estimation-based recommendation system debiasing framework (UEB), which ultimately improves the performance performance of recommendation systems by reducing the inaccuracy of imputed error and optimizing the loss functions of prediction and imputation models. However, the UEB introduces an uncertainty estimation module, which leads to higher computational overhead and increased training time, reducing the usability of real-world scenarios. Therefore, this paper also proposes to use neural stochastic differential equation model (SDE) in the recommender system to preprocess the recommender system dataset, thus reducing the computational overhead of UEB. UEB contains a prediction model and an imputation model, and introduces an uncertainty estimation module in the imputation model to achieve a reduction in the inaccuracy of imputed error by estimating the uncertainty of the pseudo-labels output from the imputation model, and finally reduce the bias of the prediction results. In order to verify the effectiveness of UEB, a series of experiments are conducted on Yahoo! R3 and Coat datasets in this paper, and the results are compared with existing methods under various metrics. In addition, a series of experiments were conducted with different uncertainty estimation implementations with different parameter settings to verify the stability of UEB. The UEB introduces an uncertainty estimation module, which increases the computational cost. To reduce the computational cost, this paper proposes to apply SDE to the recommendation system to pre-process the data. Neural networks, especially residual networks (ResNet), are closely related to dynamic systems, so that the relationship between neural network layers can be described by ordinary differential equations. By introducing a perturbation term, the ordinary differential equation can be rewritten as an SDE capable of measuring uncertainty. The SDE consists of two components: (1) a drift network to control the dynamic system to fit the prediction function; (2) a diffusion network to capture the uncertainty. The SDE is able to output different results when trained only once by sampling the drift network multiple times, the output of the drift network is used to measure the confidence level, and this is used to estimate the uncertainty and preprocess the data. In this paper, we conducted a series of experiments on the Yahoo! R3 and Coat datasets and compared with existing methods in terms of running time, number of model parameters and number of MFLOPS. The results show that SDE can reduce the computation time and computation complexity to a certain extent. We also compare the recommended performance of SDE with existing methods in the recommendation task, and demonstrate that SDE can reduce the computational cost while maintaining excellent recommendation performance. In summary, the main contributions of this paper are two: The first point is that existing debiasing methods suffer from high errors in the imputed error and high variance in the inverse probability weighted score. To address this, UEB is proposed, and experimental results demonstrate its effectiveness and stability in improving recommendation system performance. The second point is that UEB introduces an uncertainty estimation module, which increases the computational cost. We propose to apply SDE to the recommendation system for data preprocessing, and experimentally show that this method can reduce the computational cost while maintaining excellent recommendation performance

关键词推荐系统 不确定度估计 神经随机微分方程 去偏学习
语种中文
七大方向——子方向分类其他
国重实验室规划方向分类其他
是否有论文关联数据集需要存交
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/52249
专题模式识别实验室
毕业生_硕士学位论文
推荐引用方式
GB/T 7714
粟晨阳. 基于不确定度估计的推荐系统数据去偏[D],2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
粟晨阳 毕业论文.pdf(4997KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[粟晨阳]的文章
百度学术
百度学术中相似的文章
[粟晨阳]的文章
必应学术
必应学术中相似的文章
[粟晨阳]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。