Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data

Chan, Jennifer S. K.; Choy, S. T. Boris; Makov, Udi; Shamir, Ariel; Shapovalov, Vered

Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data

Jennifer S. K. Chan, S. T. Boris Choy, Udi Makov, Ariel Shamir and Vered Shapovalov
Additional contact information
Jennifer S. K. Chan: School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
S. T. Boris Choy: Discipline of Business Analytics, The University of Sydney, Sydney, NSW 2006, Australia
Udi Makov: Actuarial Resreach Center, University of Haifa, Haifa 3498838, Israel
Ariel Shamir: Efi Arazi School of Computer Science, Reichman University, Herzliya 4610101, Israel
Vered Shapovalov: Actuarial Resreach Center, University of Haifa, Haifa 3498838, Israel

Risks, 2022, vol. 10, issue 4, 1-10

Abstract: In automobile insurance, it is common to adopt a Poisson regression model to predict the number of claims as part of the actuarial pricing process. The Poisson assumption can rarely be justified, often due to overdispersion, and alternative modeling is often considered, typically zero-inflated models, which are special cases of finite mixture distributions. Finite mixture regression modeling of telematics data is challenging to implement since the huge number of covariates computationally prohibits the essential variable selection needed to attain a model with desirable predictive power devoid of overfitting. This paper aims at devising an algorithm that can carry the task of variable selection in the presence of a large number of covariates. This is achieved by generating sub-samples of the data corresponding to each component of the Poisson mixture, and wherein variable selection is applied following the enhancement of the Poisson assumption by means of controlling the number of zero claims. The resulting algorithm is assessed by measuring the out-of-sample AUC (Area Under the Curve), a Machine Learning tool for quantifying predictive power. Finally, the application of the algorithm is demonstrated by using data of claim history and telematics data describing driving behavior. It transpires that unlike alternative algorithms related to Poisson regression, the proposed algorithm is both implementable and enjoys an improved AUC (0.71). The proposed algorithm allows more accurate pricing in an era where telematics data is used for automobile insurance.

Keywords: mixture poisson regression; variable selection; telematics (search for similar items in EconPapers)
JEL-codes: C G0 G1 G2 G3 K2 M2 M4 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-9091/10/4/83/pdf (application/pdf)
https://www.mdpi.com/2227-9091/10/4/83/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jrisks:v:10:y:2022:i:4:p:83-:d:791726

Access Statistics for this article

Risks is currently edited by Mr. Claude Zhang

More articles in Risks from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().