4.7 Article

Handling class imbalance in customer churn prediction

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 36, 期 3, 页码 4626-4636

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2008.05.027

关键词

Rare events; Class imbalance; Under-sampling; Oversampling; Boosting; Random forests; CUBE; Customer churn; Classifier

向作者/读者索取更多资源

Customer churn is often a rare event in service industries, but of great interest and great value. Until recently, however, class imbalance has not received much attention in the context of data mining [Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6 (1), 7-19]. In this study, we investigate how we can better handle class imbalance in churn prediction. Using more appropriate evaluation metrics (AUC, lift), we investigated the increase in performance of sampling (both random and advanced under-sampling) and two specific modelling techniques (gradient boosting and weighted random forests) compared to some standard modelling techniques. AUC and lift prove to be good evaluation metrics. AUC does not depend on a threshold, and is therefore a better overall evaluation metric compared to accuracy. Lift is very much related to accuracy, but has the advantage of being well used in marketing practice [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press]. Results show that under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC. Unlike Ling and Li [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press], we find that there is no need to under-sample so that there are as many churners in your training set as non churners. Results show no increase in predictive performance when using the advanced sampling technique CUBE in this study. This is in line with findings of Japkowicz [Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (IC-AI'2000): Special track on inductive learning, Las Vegas, Nevada], who noted that using sophisticated sampling techniques did not give any clear advantage. Weighted random forests, as a cost-sensitive learner, performs significantly better compared to random forests, and is therefore advised. It should, however always be compared to logistic regression. Boosting is a very robust classifier, but never outperforms any other technique. (C) 2008 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Operations Research & Management Science

Evaluating the importance of different communication types in romantic tie prediction on social media

Matthias Bogaert, Michel Ballings, Dirk Van den Poel

ANNALS OF OPERATIONS RESEARCH (2018)

Article Computer Science, Interdisciplinary Applications

Machine learning refinery sensor data to predict catalyst saturation levels

Bram Steurtewagen, Dirk Van den Poel

COMPUTERS & CHEMICAL ENGINEERING (2020)

Article Management

Evaluating multi-label classifiers and recommender systems in the financial service sector

Matthias Bogaert, Justine Lootens, Dirk Van den Poel, Michel Ballings

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH (2019)

Article Agriculture, Multidisciplinary

Leveraging latent representations for milk yield prediction and interpolation using deep learning

Arno Liseune, Matthieu Salamone, Dirk Van den Poel, Bonifacius Van Ranst, Miel Hostens

COMPUTERS AND ELECTRONICS IN AGRICULTURE (2020)

Article Computer Science, Theory & Methods

Evaluation of Stream Processing Frameworks

Giselle van Dongen, Dirk Van den Poel

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2020)

Article Agriculture, Multidisciplinary

Predicting the milk yield curve of dairy cows in the subsequent lactation period using deep learning

Arno Liseune, Matthieu Salamone, Dirk Van den Poel, Bonifacius van Ranst, Miel Hostens

Summary: This study introduces a deep learning model that can predict the entire lactation curve of dairy cows, outperforming baseline models and improving predictions during the first 26 days of lactation. The framework allows farmers to enhance total production forecast and optimal herd management, and can assist in detecting diseases early and improving animal monitoring systems. By incorporating health, reproduction events, and herd management, the model enables more accurate estimation of future earnings and costs.

COMPUTERS AND ELECTRONICS IN AGRICULTURE (2021)

Article Computer Science, Artificial Intelligence

Predicting donation behavior: Acquisition modeling in the nonprofit sector using Facebook data

Lisa Schetgen, Matthias Bogaert, Dirk Van den Poel

Summary: This study demonstrates the value of Facebook data in predicting first-time donation behavior and acquiring new donors for nonprofit organizations. The combination of singular value decomposition and logistic regression outperformed other analytical methodologies, with Facebook pages and categories being the most important data types. Factors related to age, education, residence, and other dimensions played a significant role in predicting donation behavior.

DECISION SUPPORT SYSTEMS (2021)

Article Computer Science, Interdisciplinary Applications

Adding interpretability to predictive maintenance by machine learning on sensor data

Bram Steurtewagen, Dirk Van den Poel

Summary: This study utilizes a supervised machine learning approach, combining sensor and report data, to achieve prediction and diagnosis of equipment failures, highlighting the importance of diagnosis. The combination of statistical methods with proper data treatment can greatly enhance the diagnostic value of machine learning approaches.

COMPUTERS & CHEMICAL ENGINEERING (2021)

Article Computer Science, Information Systems

Influencing Factors in the Scalability of Distributed Stream Processing Jobs

Giselle Van Dongen, Dirk van den Poel

Summary: This study evaluates the scalability of stream processing jobs in four popular frameworks and finds that scaling efficiency is influenced by factors such as cluster layout, scaling direction, framework design, and data characteristics. Recommendations are provided on how to scale clusters effectively.

IEEE ACCESS (2021)

Article Computer Science, Information Systems

A Performance Analysis of Fault Recovery in Stream Processing Frameworks

Giselle van Dongen, Dirk Van den Poel

Summary: This study delves into the critical feature of built-in fault tolerance of four leading frameworks and tests their performance in various fault scenarios. Results show the significant impact of framework design on fault recovery speed.

IEEE ACCESS (2021)

Article Management

Predicting Self-declared Movie Watching Behavior Using Facebook Data and Information-Fusion Sensitivity Analysis

Matthias Bogaert, Michel Ballings, Rob Bergmans, Dirk Van den Poel

Summary: This study evaluates the feasibility of predicting whether a Facebook user has watched a certain movie genre and builds predictive models and evaluates variable importance. The results show that the adaptive boosting algorithm outperforms others, with time- and frequency-based variables related to media consumption being the most important.

DECISION SCIENCES (2021)

Article Business

The Role of Marketer-Generated Content in Customer Engagement Marketing

Matthijs Meire, Kelly Hewett, Michel Ballings, V. Kumar, Dirk Van den Poel

JOURNAL OF MARKETING (2019)

Proceedings Paper Computer Science, Information Systems

Latency Measurement of Fine-Grained Operations in Benchmarking Distributed Stream Processing Frameworks

Giselle van Dongen, Bram Steurtewagen, Dirk Van den Poel

2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS) (2018)

Article Management

Dynamics between social media engagement, firm-generated content, and live and time-shifted TV viewing

Vijay Viswanathan, Edward C. Malthouse, Ewa Maslowska, Steven Hoornaert, Dirk Van den Poel

JOURNAL OF SERVICE MANAGEMENT (2018)

Meeting Abstract Agriculture, Dairy & Animal Science

Predicting the next life event including disease by applying deep learning on sequential and pictorial data

A. Liseune, D. Van den Poel, B. Van Ranst, M. Hostens

JOURNAL OF DAIRY SCIENCE (2019)

Review Computer Science, Artificial Intelligence

A comprehensive review of slope stability analysis based on artificial intelligence methods

Wei Gao, Shuangshuang Ge

Summary: This study provides a comprehensive review of slope stability research based on artificial intelligence methods, focusing on slope stability computation and evaluation. The review covers studies using quasi-physical intelligence methods, simulated evolutionary methods, swarm intelligence methods, hybrid intelligence methods, artificial neural network methods, vector machine methods, and other intelligence methods. The merits, demerits, and state-of-the-art research advancement of these studies are analyzed, and possible research directions for slope stability investigation based on artificial intelligence methods are suggested.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Machine learning approaches for lateral strength estimation in squat shear walls: A comparative study and practical implications

Khuong Le Nguyen, Hoa Thi Trinh, Saeed Banihashemi, Thong M. Pham

Summary: This study investigated the influence of input parameters on the shear strength of RC squat walls and found that ensemble learning models, particularly XGBoost, can effectively predict the shear strength. The axial load had a greater influence than reinforcement ratio, and longitudinal reinforcement had a more significant impact compared to horizontal and vertical reinforcement. The performance of XGBoost model outperforms traditional design models and reducing input features still yields reliable predictions.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

DHESN: A deep hierarchical echo state network approach for algal bloom prediction

Bo Hu, Huiyan Zhang, Xiaoyi Wang, Li Wang, Jiping Xu, Qian Sun, Zhiyao Zhao, Lei Zhang

Summary: A deep hierarchical echo state network (DHESN) is proposed to address the limitations of shallow coupled structures. By using transfer entropy, candidate variables with strong causal relationships are selected and a hierarchical reservoir structure is established to improve prediction accuracy. Simulation results demonstrate that DHESN performs well in predicting algal bloom.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Learning high-dependence Bayesian network classifier with robust topology

Limin Wang, Lingling Li, Qilong Li, Kuo Li

Summary: This paper discusses the urgency of learning complex multivariate probability distributions due to the increase in data variability and quantity. It introduces a highly scalable classifier called TAN, which utilizes maximum weighted spanning tree (MWST) for graphical modeling. The paper theoretically proves the feasibility of extending one-dependence MWST to model high-dependence relationships and proposes a heuristic search strategy to improve the fitness of the extended topology to data. Experimental results demonstrate that this algorithm achieves a good bias-variance tradeoff and competitive classification performance compared to other high-dependence or ensemble learning algorithms.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Make a song curative: A spatio-temporal therapeutic music transfer model for anxiety reduction

Zhejing Hu, Gong Chen, Yan Liu, Xiao Ma, Nianhong Guan, Xiaoying Wang

Summary: Anxiety is a prevalent issue and music therapy has been found effective in reducing anxiety. To meet the diverse needs of individuals, a novel model called the spatio-temporal therapeutic music transfer model (StTMTM) is proposed.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

A modified reverse-based analysis logic mining model with Weighted Random 2 Satisfiability logic in Discrete Hopfield Neural Network and multi-objective training of Modified Niched Genetic Algorithm

Nur Ezlin Zamri, Mohd. Asyraf Mansor, Mohd Shareduwan Mohd Kasihmuddin, Siti Syatirah Sidik, Alyaa Alway, Nurul Atiqah Romli, Yueling Guo, Siti Zulaikha Mohd Jamaludin

Summary: In this study, a hybrid logic mining model was proposed by combining the logic mining approach with the Modified Niche Genetic Algorithm. This model improves the generalizability and storage capacity of the retrieved induced logic. Various modifications were made to address other issues. Experimental results demonstrate that the proposed model outperforms baseline methods in terms of accuracy, precision, specificity, and correlation coefficient.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

On taking advantage of opportunistic meta-knowledge to reduce configuration spaces for automated machine learning

David Jacob Kedziora, Tien-Dung Nguyen, Katarzyna Musial, Bogdan Gabrys

Summary: The paper addresses the problem of efficiently optimizing machine learning solutions by reducing the configuration space of ML pipelines and leveraging historical performance. The experiments conducted show that opportunistic/systematic meta-knowledge can improve ML outcomes, and configuration-space culling is optimal when balanced. The utility and impact of meta-knowledge depend on various factors and are crucial for generating informative meta-knowledge bases.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Optimal location for an EVPL and capacitors in grid for voltage profile and power loss: FHO-SNN approach

G. Sophia Jasmine, Rajasekaran Stanislaus, N. Manoj Kumar, Thangamuthu Logeswaran

Summary: In the context of a rapidly expanding electric vehicle market, this research investigates the ideal locations for EV charging stations and capacitors in power grids to enhance voltage stability and reduce power losses. A hybrid approach combining the Fire Hawk Optimizer and Spiking Neural Network is proposed, which shows promising results in improving system performance. The optimization approach has the potential to enhance the stability and efficiency of electric grids.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

NLP-based approach for automated safety requirements information retrieval from project documents

Zhijiang Wu, Guofeng Ma

Summary: This study proposes a natural language processing-based framework for requirement retrieval and document association, which can help to mine and retrieve documents related to project managers' requirements. The framework analyzes the ontology relevance and emotional preference of requirements. The results show that the framework performs well in terms of iterations and threshold, and there is a significant matching between the retrieved documents and the requirements, which has significant managerial implications for construction safety management.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Dog nose-print recognition based on the shape and spatial features of scales

Yung-Kuan Chan, Chuen-Horng Lin, Yuan-Rong Ben, Ching-Lin Wang, Shu-Chun Yang, Meng-Hsiun Tsai, Shyr-Shen Yu

Summary: This study proposes a novel method for dog identification using nose-print recognition, which can be applied to controlling stray dogs, locating lost pets, and pet insurance verification. The method achieves high recognition accuracy through two-stage segmentation and feature extraction using a genetic algorithm.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Fostering supply chain resilience for omni-channel retailers: A two-phase approach for supplier selection and demand allocation under disruption risks

Shaohua Song, Elena Tappia, Guang Song, Xianliang Shi, T. C. E. Cheng

Summary: This study aims to optimize supplier selection and demand allocation decisions for omni-channel retailers in order to achieve supply chain resilience. It proposes a two-phase approach that takes into account various factors such as supplier evaluation and demand allocation.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Accelerating Benders decomposition approach for shared parking spaces allocation considering parking unpunctuality and no-shows

Jinyan Hu, Yanping Jiang

Summary: This paper examines the allocation problem of shared parking spaces considering parking unpunctuality and no-shows. It proposes an effective approach using sample average approximation (SAA) combined with an accelerating Benders decomposition (ABD) algorithm to solve the problem. The numerical experiments demonstrate the significance of supply-demand balance for the operation and user satisfaction of the shared parking system.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Review Computer Science, Artificial Intelligence

Financial fraud detection using graph neural networks: A systematic review

Soroor Motie, Bijan Raahemi

Summary: Financial fraud is a persistent problem in the finance industry, but Graph Neural Networks (GNNs) have emerged as a powerful tool for detecting fraudulent activities. This systematic review provides a comprehensive overview of the current state-of-the-art technologies in using GNNs for financial fraud detection, identifies gaps and limitations in existing research, and suggests potential directions for future research.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Review Computer Science, Artificial Intelligence

Occluded person re-identification with deep learning: A survey and perspectives

Enhao Ning, Changshuo Wang, Huang Zhang, Xin Ning, Prayag Tiwari

Summary: This review provides a detailed overview of occluded person re-identification methods and conducts a systematic analysis and comparison of existing deep learning-based approaches. It offers important theoretical and practical references for future research in the field.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

A hierarchical attention detector for bearing surface defect detection

Jiajun Ma, Songyu Hu, Jianzhong Fu, Gui Chen

Summary: The article presents a novel visual hierarchical attention detector for multi-scale defect location and classification, utilizing texture, semantic, and instance features of defects through a hierarchical attention mechanism, achieving multi-scale defect detection in bearing images with complex backgrounds.

EXPERT SYSTEMS WITH APPLICATIONS (2024)