At present, the rating methods of credit rating agencies are still based on the scorecard, which is difficult to deal with the problems of ”small sample, multi-attribute, incomplete and unbalanced”. It seriously restricts the accuracy, credibility and right to speak of the rating results of credit rating agencies. With the rapid development of the ”Belt and Road”, the volume of cross-border investment by Chinese companies is growing, and the risks are getting higher. It is very necessary, urgent and meaningful to solve the bottleneck technology of intelligent credit rating model design as soon as possible.
This thesis firstly investigates the current mainstream credit rating models from three levels: the traditional statistical learning models, the classic machine learning models and the latest neural network rating models. Aiming at the problems encountered in the process of corporate credit rating research, such as incomplete modeling level, single graph relational data, and unavailability of massive unlabeled data, a new novel hierarchical heterogeneous graph network model is proposed. The data processing module, feature graph network module, corporate heterogeneous graph network module and downstream task module are mainly designed. The feature graph network module and the corporate heterogeneous graph network module model the interaction between features and corporations from the perspectives of features and corporations respectively, which alleviates the problem of incomplete modeling level and reflects the hierarchical characteristics of our model. The corporate heterogeneous graph network module uses the attention mechanism to comprehensively model multiple heterogeneous relationships to alleviate the problem of single relational graph data, reflecting the heterogeneous characteristics of our model. In order to utilize the massive unlabeled data, the data processing module firstly predicts the unlabeled data with pseudo-labels, and the downstream task module uses the pseudo-label classifier to learn. Then the adversarial
discriminator is designed to reduce the gap between the labeled data and the pseudolabeled data. Therefore, the collected data is utilized to the maximum, reflecting the adversarial characteristics of our model.
Based on the model theoretical research, we collected 39 characteristics from multiple channels such as the China Economic and Financial Database, including six aspects： profitability, operational capability, growth capability, solvency, cash flow capability and DuPont index. We collected the rating results of Chinese listed companies from well-known rating agencies, and built our own corporate credit rating dataset for Chinese listed companies. Next, we selected four evaluation indicators: model size, accuracy, recall and F1-Score, and compared and analyzed the benchmark rating models: logistic regression, support vector machine, multi-layer perceptron, XGBoost, convolutional neural network model, graph neural network model, adversarial semi-supervised model and our hierarchical heterogeneous graph network model on the corporate credit
rating dataset of Chinese listed companies. The experimental results show that our proposed hierarchical heterogeneous graph network model surpasses the seven benchmark models in terms of recall, accuracy and F1-Score, which fully reflects the powerful feature extraction and expression capabilities of our model. Besides, another feature of our work is that we conduct quantitative analysis experiments from four aspects: pseudolabel rating model, unlabeled data volume, feature graph network and corporate graph network to explore the gain impact brought by each module, which increasing the interpretability of the hierarchical heterogeneous graph network model.