EconPapers    
Economics at your fingertips  
 

Random Forest Adaptation for High-Dimensional Count Regression

Oyebayo Ridwan Olaniran, Saidat Fehintola Olaniran, Ali Rashash R. Alzahrani (), Nada MohammedSaeed Alharbi and Asma Ahmad Alzahrani
Additional contact information
Oyebayo Ridwan Olaniran: Department of Statistics, Faculty of Physical Sciences, University of Ilorin, Ilorin 1515, Nigeria
Saidat Fehintola Olaniran: Department of Statistics and Mathematical Sciences, Faculty of Pure and Applied Sciences, Kwara State University, Malete 1530, Nigeria
Ali Rashash R. Alzahrani: Mathematics Department, Faculty of Sciences, Umm Al-Qura University, Makkah 24382, Saudi Arabia
Nada MohammedSaeed Alharbi: Department of Mathematics, Faculty of Science, Taibah University, Al-Madinah Al-Munawara 42353, Saudi Arabia
Asma Ahmad Alzahrani: Department of Mathematics, Faculty of Science, Al-Baha University, Al-Baha 65779, Saudi Arabia

Mathematics, 2025, vol. 13, issue 18, 1-32

Abstract: The analysis of high-dimensional count data presents a unique set of challenges, including overdispersion, zero-inflation, and complex nonlinear relationships that traditional generalized linear models and standard machine learning approaches often fail to adequately address. This study introduces and validates a novel Random Forest framework specifically developed for high-dimensional Poisson and Negative Binomial regression, designed to overcome the limitations of existing methods. Through comprehensive simulations and a real-world genomic application to the Norwegian Mother and Child Cohort Study, we demonstrate that the proposed methods achieve superior predictive accuracy, quantified by lower root mean squared error and deviance, and critically produced exceptionally stable and interpretable feature selections. Our theoretical and empirical results show that these distribution-optimized ensembles significantly outperform both penalized-likelihood techniques and naive-transformation-based ensembles in balancing statistical robustness with biological interpretability. The study concludes that the proposed frameworks provide a crucial methodological advancement, offering a powerful and reliable tool for extracting meaningful insights from complex count data in fields ranging from genomics to public health.

Keywords: high-dimensional analysis; count regression; Random Forest; overdispersion; zero-inflation; genomic analysis (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/18/3041/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/18/3041/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:18:p:3041-:d:1754357

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-09-22
Handle: RePEc:gam:jmathe:v:13:y:2025:i:18:p:3041-:d:1754357