☆ 4.7 Article

A graph-based semi-supervised reject inference framework considering imbalanced data distribution for consumer credit scoring

APPLIED SOFT COMPUTING (2021)

Journal

APPLIED SOFT COMPUTING

Volume 105, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.asoc.2021.107259

Keywords

Financial technology; Credit scoring; Reject inference; Imbalanced learning; Semi-supervised learning; Label spreading

Funding

National Key Technologies R&D Program of China [2018YFC0213600]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a novel reject inference framework for consumer credit scoring in the Chinese fintech industry, which takes into account the imbalanced data distribution issue. By utilizing advanced graph-based semi-supervised learning algorithm and imbalanced learning techniques, the proposed framework outperforms traditional scoring models in performance evaluation across different metrics, contributing to advancing credit scoring research and improving fintech practices.

Credit scoring has been attracting increasing attention in the Chinese consumer financial industry. Traditional approaches are easily influenced by sample selection bias because they use accepted applicant samples only, while the applicant population also includes rejected applicants. Reject inference is a technique to infer good/bad labels for rejected applicants, which can overcome biases in credit scoring. However, previously proposed reject inference methods usually ignore the imbalanced distribution in accepted data, which means that good applicants are much more than bad ones in most practical consumer loan applications. Both the neglect of rejected data and the imbalanced distribution in accepted data weaken the performance of current credit scoring models. In this paper, we propose a novel reject inference framework that takes into account the imbalanced data distribution for consumer credit scoring. First, we use an advanced graph-based semi-supervised learning algorithm to solve the reject inference problem, which is called label spreading. Second, faced with an imbalanced distribution of good and bad samples in accepted applicants, we conduct imbalanced learning using a modified Synthetic Minority Over-sampling Technique before reject inference. Then, six binary classifiers are studied in our proposed framework for credit scoring modeling. Finally, we present the results of four exact experiments as well as online A/B tests for performance evaluation using data provided by a leading Chinese fintech company. Empirical results indicate that the proposed framework performs better than traditional scoring models across different evaluation metrics, representing a progressive method that promotes credit scoring research as well as improving fintech practices. (C) 2021 Elsevier B.V. All rights reserved.

A graph-based semi-supervised reject inference framework considering imbalanced data distribution for consumer credit scoring

Journal

APPLIED SOFT COMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A graph-based semi-supervised reject inference framework considering imbalanced data distribution for consumer credit scoring

Journal

APPLIED SOFT COMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper