A survey on machine learning methods for churn prediction
Louis Geiler (), 
Séverine Affeldt () and 
Mohamed Nadif ()
Additional contact information 
Louis Geiler: CB - CB - Centre Borelli - UMR 9010 - Service de Santé des Armées - INSERM - Institut National de la Santé et de la Recherche Médicale - Université Paris-Saclay - CNRS - Centre National de la Recherche Scientifique - ENS Paris Saclay - Ecole Normale Supérieure Paris-Saclay - UPCité - Université Paris Cité
Séverine Affeldt: CB - CB - Centre Borelli - UMR 9010 - Service de Santé des Armées - INSERM - Institut National de la Santé et de la Recherche Médicale - Université Paris-Saclay - CNRS - Centre National de la Recherche Scientifique - ENS Paris Saclay - Ecole Normale Supérieure Paris-Saclay - UPCité - Université Paris Cité
Mohamed Nadif: CB - CB - Centre Borelli - UMR 9010 - Service de Santé des Armées - INSERM - Institut National de la Santé et de la Recherche Médicale - Université Paris-Saclay - CNRS - Centre National de la Recherche Scientifique - ENS Paris Saclay - Ecole Normale Supérieure Paris-Saclay - UPCité - Université Paris Cité
Post-Print from  HAL
Abstract:
The diversity and specificities of today's businesses have leveraged a wide range of prediction techniques. In particular, churn prediction is a major economic concern for many companies. The purpose of this study is to draw general guidelines from a benchmark of supervised machine learning techniques in association with widely used data sampling approaches on publicly available datasets in the context of churn prediction. Choosing a priori the most appropriate sampling method as well as the most suitable classification model is not trivial, as it strongly depends on the data intrinsic characteristics. In this paper we study the behavior of eleven supervised and semi-supervised learning methods and seven sampling approaches on sixteen diverse and publicly available churn-like datasets. Our evaluations, reported in terms of the Area Under the Curve (AUC) metric, explore the influence of sampling approaches and data characteristics on the performance of the studied learning methods. Besides, we propose Nemenyi test and Correspondence Analysis as means of comparison and visualization of the association between classification algorithms, sampling methods and datasets. Most importantly, our experiments lead to a practical recommendation for a prediction pipeline based on an ensemble approach. Our proposal can be successfully applied to a wide range of churn-like datasets.
Keywords: churn prediction; machine learning; ensemble technique (search for similar items in EconPapers)
Date: 2022
New Economics Papers: this item is included in nep-big and nep-cmp
Note: View the original document on HAL open archive server: https://hal.science/hal-03824873v1
References: View references in EconPapers View complete reference list from CitEc 
Citations: View citations in EconPapers (2) 
Published in International Journal of Data Science and Analytics, 2022, 14 (3), pp.217-242. ⟨10.1007/s41060-022-00312-5⟩
Downloads: (external link)
https://hal.science/hal-03824873v1/document (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX 
RIS (EndNote, ProCite, RefMan) 
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-03824873
DOI: 10.1007/s41060-022-00312-5
Access Statistics for this paper
More papers in Post-Print  from  HAL
Bibliographic data for series maintained by CCSD ().