CASIA OpenIR  > 毕业生  > 硕士学位论文
公共数字文化平台资源个性化推荐技术研究
叶墅锋1,2
Subtype工程硕士
Thesis Advisor王健
2018-05-30
Degree Grantor中国科学院研究生院
Place of Conferral北京
Keyword公共数字文化资源 Lda 个性化推荐 协同过滤推荐 标签融合 时间加权
Abstract公共数字文化资源具有数据量大、分类复杂和同质性强的特点,用户难以在海量的资源中高效地发现真正感兴趣的资源。个性化推荐能够捕捉用户兴趣,并主动向用户推荐喜欢的资源,是解决上述问题的关键技术。本文针对传统协同过滤方法在公共数字文化共享服务领域所遇到的用户文化行为数据高维稀疏性问题和用户文化兴趣变化快的问题,基于公共数字文化资源语义分析的特点和推荐算法的特点,提出两种协同过滤推荐算法的优化方法,通过实验验证本文提出方法对解决上述问题的有效性。本文主要工作及成果如下:
  1. 分析了公共数字文化资源的特点;对比了不同的个性化推荐算法的优缺点;结合公共数字文化资源的特点,确定协同过滤推荐为研究对象,研究公共数字文化的个性化推荐。
  2. 提出了融合主题模型和词向量模型的公共数字文化资源语义标注技术,实现了从公共数字文化资源的少量元数据中提取语义标签。该技术包含基于LDA模型文化资源内容主题分析算法,用于提取语义标签;以及基于深度神经网络模型Word2Vec方法,用于扩展资源语义标签。由此构建公共文化资源的标签库,为之后的个性化推荐算法的优化提供数据支撑。
  3. 提出了两种用于个性化推荐的优化方法:一种是基于标签融合的协同过滤推荐,通过构建用户-标签的评分得到低维空间数据的方法来帮助解决数据稀疏性难题;另一种是基于时间加权的协同过滤推荐算法,通过引入指数形式时间衰减函数,来确定用户时间权重系数,进而调整用户-资源评分,帮助解决用户兴趣随时间变化问题。
  4. 研发了公共数字文化资源个性化推荐系统,设计了系统的输入输出模块、数据分析模块、推荐引擎模块和模型评估模块,并通过API接口为公共数字文化共享服务平台提供个性化推荐分析的功能,支撑该平台为用户推荐感兴趣的数字文化资源。
本文使用自然语言处理领域的LDA模型和Word2Vec模型对公共数字文化资源进行语义分析,并将分析结果融合与推荐算法相融合,提出了基于公共数字文化资源语义标签融合的公共数字文化用户行为数据降维方法,以及基于公共数字文化用户行为时间感知的评分加权加权方法。本文通过实验验证了此两种方法的准确性和有效性,为公共数字文化资源分析和平台优化提供了有效的技术途径,对推动公共文化资源建设具有重要意义。
论文取得的创新点如下:
  1. 提出一种融合主题模型和词向量模型的公共数字文化资源语义标注技术,用于实现从公共数字文化资源的少量元数据中提取语义标签,并由此构建文化资源的标签库,为个性化推荐算法的优化提供基础;
  2. 提出一种基于公共数字文化资源语义聚合的公共数字文化用户行为数据的降维方法,该方法通过构建用户-标签评分矩阵,得到低维空间数据,有效的降低了协同过滤推荐算法中数据的由高稀疏性问题所带来的影响;
  3. 提出一种基于公共数字文化用户行为时间感知的评分加权方法,该方法通过时间指数函数确定用户时间权重系数。基于此调整用户-资源评分矩阵,进而帮助解决协同过滤推荐中用户兴趣变化问题。
Other AbstractPublic digital cultural resources are multi-source heterogeneous data. People can hardly find the resources they are interested. The purpose of personalized recommendation is to capture user interest in real time and actively recommend favorite resources to the user. Based on public digital cultural resources, this paper mainly uses semantic methods to analyze resources, further optimizes personalized recommendation algorithms and applies them to public digital cultural resource platforms. According to the characteristics of public digital cultural resource tags and different recommendation algorithms, two new recommendation algorithms based on collaborative filtering are proposed to help solve the problems of user interest changes over time and data sparsity in collaborative filtering algorithms. The main work and achievements are as follows:
 Firstly, public cultural resources was researched. At the same time, the advantages and disadvantages of different commonly appling recommendation algorithms are compared. By considering the characteristics of public cultural resources, collaborative filtering recommendation algorithm is selected as the basic algorithm of the research precision recommendation of public digital culture resources.
Secondly, this work has implemented the extraction of semantic tags from a small amount of metadata of public digital cultural resources. The technology includes a meta-information thematic analysis algorithm of cultural resources based on the LDA model, extracting semantic tags; and the Word2Vec algorithm based on the deep neural network, extending the semantic tags of resources to build a tag library of cultural resources, and provides the basis for optimization of personalized recommendation algorithms.
Thirdly, based on the public digital cultural platform, two new recommendation methods are proposed. One is collaborative filtering recommendation combined with tags, which obtains low-dimensional spatial data by constructing user-label scores to help solves the problem of data sparsity; the other one is a time-weighted collaborative filtering recommendation algorithm, which determines the user time-weighted coefficient by introducing an exponential function that decreases over time, and then adjusts the user-resource rating matrix to recommend and help solve the problem of user interest changes over time
Finally, this work has developed a personalized recommendation system based on public digital cultural resources, and implemented the system's input and output modules, model analysis module, recommendation engine module and model evaluation module. And for the public digital digital sharing service platform, the API interface has been designed to provide personalized recommendation analysis, supporting the platform to recommend digital cultural resources of interest for users.
In this thesis, the LDA model and Word2Vec model in the field of natural language processing are used to semantically analyze the public digital cultural resources. The analysis results are integrated into the recommendation algorithm, and a personalized recommendation algorithm based on tag fusion and time-based weighting is proposed. And the accuracy and effectiveness are verified by experiments to provide an effective and credible technical approach for the analysis of public cultural resources and platform optimization, which is of great significance for promoting the construction of public cultural resources.
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/20946
Collection毕业生_硕士学位论文
Affiliation1.中国科学院自动化研究所
2.中国科学院大学
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
叶墅锋. 公共数字文化平台资源个性化推荐技术研究[D]. 北京. 中国科学院研究生院,2018.
Files in This Item:
File Name/Size DocType Version Access License
公共数字文化平台资源个性化推荐技术研究.(2970KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[叶墅锋]'s Articles
Baidu academic
Similar articles in Baidu academic
[叶墅锋]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[叶墅锋]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.