4.7 Article

CUS-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection

Journal

APPLIED SOFT COMPUTING
Volume 97, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.asoc.2020.106758

Keywords

Financial distress prediction; CUS-GBDT; XGBoost; Heterogeneous ensemble; Feature selection

Funding

  1. Key Project of Philosophy and Social Sciences Planning in Anhui Province [AHSKZ2018D14]
  2. Key Projects of Natural Science Research of Universities in Anhui Province [KJ2019A0651, KJ2020A0008]
  3. Natural Science Foundation of Anhui Province [2008085MG234]

Ask authors/readers for more resources

Due to the global financial crisis occurred in 2008, with a large amount of companies troubling in financial distress, the machine learning-based prediction of this dilemma has shown economic stakeholders' great practicability. In the field of machine learning, most of the previous studies only focus on the improvement of the imbalanced datasets sampling methods or the introduction of multiple classifiers in the constructing stage for prediction model. In view of this, this paper attempts to improve the scope and depth of ensemble to achieve better prediction performance for a severely imbalanced dataset of financial data of Chinese listed companies. For the first time, this paper combines the clustering-based under-sampling (CUS) with the gradient boosting decision tree (GBDT) to construct the model, which is used along with the current widely used extreme gradient boosting (XGBoost) as heterogeneous classifier to build heterogeneous ensemble in financial distress prediction. In addition, based on the idea of ensemble, this paper uses five feature selection methods based on different theoretical backgrounds to select features, and introduces ensemble from the whole process of feature selection, data preprocessing and model construction. In the comparative experience, the method proposed by us achieves the best performance on the test set. This study demonstrates the broad application of CUS for financial data processing and the superior generalization performance of the ensemble model relative to individual learners. (C) 2020 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available