CASIA OpenIR  > 毕业生  > 硕士学位论文
基于层次化异质图神经网络的企业信用评级技术研究
冯博 靖
Subtype硕士
Thesis Advisor薛文芳
2022-05-21
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机技术
Keyword图神经网络 异质图网络 企业信用评级
Abstract

       目前我国独立信用评级机构的评级方法仍以打分卡为主,难以处理中小企
业信用评级中存在的“小样本、多属性、不完备、不均衡”问题,严重制约了国
内机构评级结果的准确度、公信力和话语权。随着“一带一路”倡议的快速推进,中国企业跨境投资的体量越来越大,风险越来越高,尽早解决智能评级模型设计的卡脖子技术非常必要,非常迫切,非常有意义。
      本论文首先从传统的统计学习模型,基于经典机器学习的评级模型和最新
基于神经网络的评级模型三个层面,深入调研了当前主流的信用评级模型方法。
针对企业信用评级研究过程中遇到的建模层级不完整、图关系数据利用单一以
及海量无标签数据无法利用等问题,提出了新颖的层次化异质图网络模型。重点
设计了模型的数据处理模块、特征图网络模块、企业异质图网络模块和下游任务
模块。特征图网络模块和企业异质图网络模块分别从特征和企业角度建模获取
特征间和企业间的交互信息,缓解了建模层级不完整的问题,体现了新模型的层
次性特点;企业异质图网络模块利用注意力机制综合使用多种异质关系缓解了
图数据利用单一的问题,体现了新模型的异质性;为了利用海量的无标签数据,
数据处理模块先给无标签数据打上伪标签,下游任务模块中使用伪标签分类器
来学习,并设计对抗判别器来减少有标签数据和伪标签数据之间的鸿沟,最大限
度利用了所收集到的数据,体现了新模型的对抗性。
      在模型理论研究基础上,我们从中国经济金融数据库等多个渠道收集企业
39 个特征指标,包括盈利能力、运营能力、成长能力、偿债能力、现金流能力
和杜邦指数六大方面;从著名评级公司搜集中国上市公司评级结果,自建了中国
上市公司企业信用评级数据集。接下来,我们选择模型大小、准确率、召回率和
F1-Score 四个评价指标,对比分析了基准评级模型:逻辑回归、支持向量机、多
层感知机、利用残差网络增强的树模型、卷积神经网络、图神经网络、对抗半监
督模型和我们重点设计的层次化异质图网络模型在中国上市公司企业信用评级
数据集上的实验性能。实验分析结果发现:我们提出的层次化异质图网络新模型
在召回率,准确率以及 F1-Score 上都超越了前 7 个基准模型,充分体现了新模
型强大的特征提取和表达能力。另一个特色工作,我们分别从伪标签评级模型,无标签数据量,特征图和企业图网络四个方面进行定量分析实验,探究每个模块
所带来的增益影响,增加了层次化异质图网络新模型的可解释性。

Other Abstract

At present, the rating methods of credit rating agencies are still based on the scorecard, which is difficult to deal with the problems of ”small sample, multi-attribute, incomplete and unbalanced”. It seriously restricts the accuracy, credibility and right to speak of the rating results of credit rating agencies. With the rapid development of the ”Belt and Road”, the volume of cross-border investment by Chinese companies is growing, and the risks are getting higher. It is very necessary, urgent and meaningful to solve the bottleneck technology of intelligent credit rating model design as soon as possible.
This thesis firstly investigates the current mainstream credit rating models from three levels: the traditional statistical learning models, the classic machine learning models and the latest neural network rating models. Aiming at the problems encountered in the process of corporate credit rating research, such as incomplete modeling level, single graph relational data, and unavailability of massive unlabeled data, a new novel hierarchical heterogeneous graph network model is proposed. The data processing module, feature graph network module, corporate heterogeneous graph network module and downstream task module are mainly designed. The feature graph network module and the corporate heterogeneous graph network module model the interaction between features and corporations from the perspectives of features and corporations respectively, which alleviates the problem of incomplete modeling level and reflects the hierarchical characteristics of our model. The corporate heterogeneous graph network module uses the attention mechanism to comprehensively model multiple heterogeneous relationships to alleviate the problem of single relational graph data, reflecting the heterogeneous characteristics of our model. In order to utilize the massive unlabeled data, the data processing module firstly predicts the unlabeled data with pseudo-labels, and the downstream task module uses the pseudo-label classifier to learn. Then the adversarial
discriminator is designed to reduce the gap between the labeled data and the pseudolabeled data. Therefore, the collected data is utilized to the maximum, reflecting the adversarial characteristics of our model.
Based on the model theoretical research, we collected 39 characteristics from multiple channels such as the China Economic and Financial Database, including six aspects: profitability, operational capability, growth capability, solvency, cash flow capability and DuPont index. We collected the rating results of Chinese listed companies from well-known rating agencies, and built our own corporate credit rating dataset for Chinese listed companies. Next, we selected four evaluation indicators: model size, accuracy, recall and F1-Score, and compared and analyzed the benchmark rating models: logistic regression, support vector machine, multi-layer perceptron, XGBoost, convolutional neural network model, graph neural network model, adversarial semi-supervised model and our hierarchical heterogeneous graph network model on the corporate credit
rating dataset of Chinese listed companies. The experimental results show that our proposed hierarchical heterogeneous graph network model surpasses the seven benchmark models in terms of recall, accuracy and F1-Score, which fully reflects the powerful feature extraction and expression capabilities of our model. Besides, another feature of our work is that we conduct quantitative analysis experiments from four aspects: pseudolabel rating model, unlabeled data volume, feature graph network and corporate graph network to explore the gain impact brought by each module, which increasing the interpretability of the hierarchical heterogeneous graph network model.

Pages114
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/48459
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
冯博 靖. 基于层次化异质图神经网络的企业信用评级技术研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.
Files in This Item:
File Name/Size DocType Version Access License
毕业论文最终版(已签名).pdf(10612KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[冯博 靖]'s Articles
Baidu academic
Similar articles in Baidu academic
[冯博 靖]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[冯博 靖]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.