Post-Calibration Techniques: Balancing Calibration and Score Distribution Alignment
Agathe Fernandes Machado (fernandes_machado.agathe@courrier.uqam.ca),
Arthur Charpentier (charpentier.arthur@uqam.ca),
Emmanuel Flachaire (emmanuel.flachaire@univ-amu.fr),
Ewen Gallic (ewen.gallic@gmail.com) and
François Hu
Additional contact information
Agathe Fernandes Machado: UQAM - Université du Québec à Montréal = University of Québec in Montréal
Arthur Charpentier: CREST - Centre de Recherche en Économie et Statistique - ENSAI - Ecole Nationale de la Statistique et de l'Analyse de l'Information [Bruz] - X - École polytechnique - IP Paris - Institut Polytechnique de Paris - ENSAE Paris - École Nationale de la Statistique et de l'Administration Économique - IP Paris - Institut Polytechnique de Paris - CNRS - Centre National de la Recherche Scientifique
Emmanuel Flachaire: AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique
Ewen Gallic: AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique
François Hu: ENSAE Paris - École Nationale de la Statistique et de l'Administration Économique - IP Paris - Institut Polytechnique de Paris
Post-Print from HAL
Abstract:
A binary scoring classifier can appear well-calibrated according to standard calibration metrics, even when the distribution of scores does not align with the distribution of the true events. In this paper, we investigate the impact of postprocessing calibration on the score distribution (sometimes named "recalibration"). Using simulated data, where the true probability is known, followed by real-world datasets with prior knowledge on event distributions, we compare the performance of an XGBoost model before and after applying calibration techniques. The results show that while applying methods such as Platt scaling, Beta calibration, or isotonic regression can improve the model's calibration, they may also lead to an increase in the divergence between the score distribution and the underlying event probability distribution.
Date: 2024-05-10
Note: View the original document on HAL open archive server: https://hal.science/hal-04916151v1
References: View complete reference list from CitEc
Citations:
Published in 38th Conference on Neural Information Processing Systems (NeurIPS 2024), May 2024, San Diego, United States
Downloads: (external link)
https://hal.science/hal-04916151v1/document (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-04916151
Access Statistics for this paper
More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD (hal@ccsd.cnrs.fr).