4.7 Article

A graph-based semi-supervised reject inference framework considering imbalanced data distribution for consumer credit scoring

Journal

APPLIED SOFT COMPUTING
Volume 105, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.asoc.2021.107259

Keywords

Financial technology; Credit scoring; Reject inference; Imbalanced learning; Semi-supervised learning; Label spreading

Funding

  1. National Key Technologies R&D Program of China [2018YFC0213600]

Ask authors/readers for more resources

This paper proposes a novel reject inference framework for consumer credit scoring in the Chinese fintech industry, which takes into account the imbalanced data distribution issue. By utilizing advanced graph-based semi-supervised learning algorithm and imbalanced learning techniques, the proposed framework outperforms traditional scoring models in performance evaluation across different metrics, contributing to advancing credit scoring research and improving fintech practices.
Credit scoring has been attracting increasing attention in the Chinese consumer financial industry. Traditional approaches are easily influenced by sample selection bias because they use accepted applicant samples only, while the applicant population also includes rejected applicants. Reject inference is a technique to infer good/bad labels for rejected applicants, which can overcome biases in credit scoring. However, previously proposed reject inference methods usually ignore the imbalanced distribution in accepted data, which means that good applicants are much more than bad ones in most practical consumer loan applications. Both the neglect of rejected data and the imbalanced distribution in accepted data weaken the performance of current credit scoring models. In this paper, we propose a novel reject inference framework that takes into account the imbalanced data distribution for consumer credit scoring. First, we use an advanced graph-based semi-supervised learning algorithm to solve the reject inference problem, which is called label spreading. Second, faced with an imbalanced distribution of good and bad samples in accepted applicants, we conduct imbalanced learning using a modified Synthetic Minority Over-sampling Technique before reject inference. Then, six binary classifiers are studied in our proposed framework for credit scoring modeling. Finally, we present the results of four exact experiments as well as online A/B tests for performance evaluation using data provided by a leading Chinese fintech company. Empirical results indicate that the proposed framework performs better than traditional scoring models across different evaluation metrics, representing a progressive method that promotes credit scoring research as well as improving fintech practices. (C) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available