Prediction of Daily Mean PM 10 Concentrations Using Random Forest, CART Ensemble and Bagging Stacked by MARS
Snezhana Gocheva-Ilieva,
Atanas Ivanov and
Maya Stoimenova-Minova
Additional contact information
Snezhana Gocheva-Ilieva: Department of Mathematical Analysis, Faculty of Mathematics and Informatics, Paisii Hilendarski University of Plovdiv, 4000 Plovdiv, Bulgaria
Atanas Ivanov: Department of Mathematical Analysis, Faculty of Mathematics and Informatics, Paisii Hilendarski University of Plovdiv, 4000 Plovdiv, Bulgaria
Maya Stoimenova-Minova: Department of Mathematical Analysis, Faculty of Mathematics and Informatics, Paisii Hilendarski University of Plovdiv, 4000 Plovdiv, Bulgaria
Sustainability, 2022, vol. 14, issue 2, 1-26
Abstract:
A novel framework for stacked regression based on machine learning was developed to predict the daily average concentrations of particulate matter (PM 10 ), one of Bulgaria’s primary health concerns. The measurements of nine meteorological parameters were introduced as independent variables. The goal was to carefully study a limited number of initial predictors and extract stochastic information from them to build an extended set of data that allowed the creation of highly efficient predictive models. Four base models using random forest, CART ensemble and bagging, and their rotation variants, were built and evaluated. The heterogeneity of these base models was achieved by introducing five types of diversities, including a new simplified selective ensemble algorithm. The predictions from the four base models were then used as predictors in multivariate adaptive regression splines (MARS) models. All models were statistically tested using out-of-bag or with 5-fold and 10-fold cross-validation. In addition, a variable importance analysis was conducted. The proposed framework was used for short-term forecasting of out-of-sample data for seven days. It was shown that the stacked models outperformed all single base models. An index of agreement IA = 0.986 and a coefficient of determination of about 95% were achieved.
Keywords: air pollution; machine learning; stacking; rotation ensemble; bagging; selective ensemble; diversity strategy (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.mdpi.com/2071-1050/14/2/798/pdf (application/pdf)
https://www.mdpi.com/2071-1050/14/2/798/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:14:y:2022:i:2:p:798-:d:722504
Access Statistics for this article
Sustainability is currently edited by Ms. Alexandra Wu
More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().