Improving Air Quality Data Reliability through Bi-Directional Univariate Imputation with the Random Forest Algorithm

Arnaut, Filip; Đurđević, Vladimir; Kolarski, Aleksandra; Srećković, Vladimir A.; Jevremović, Sreten

Improving Air Quality Data Reliability through Bi-Directional Univariate Imputation with the Random Forest Algorithm

Filip Arnaut (), Vladimir Đurđević, Aleksandra Kolarski, Vladimir A. Srećković and Sreten Jevremović
Additional contact information
Filip Arnaut: Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia
Vladimir Đurđević: Faculty of Physics, University of Belgrade, Cara Dušana 13, 11000 Belgrade, Serbia
Aleksandra Kolarski: Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia
Vladimir A. Srećković: Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia
Sreten Jevremović: Scientific Society “Isaac Newton”, Volgina 7, 11160 Belgrade, Serbia

Sustainability, 2024, vol. 16, issue 17, 1-17

Abstract: Forecasting the future levels of air pollution provides valuable information that holds importance for the general public, vulnerable populations, and policymakers. High-quality data are essential for precise and reliable forecasts and investigations of air pollution. Missing observations arise when the sensors utilized for assessing air quality parameters experience malfunctions, which result in erroneous measurements or gaps in the dataset and hinder the data quality. This research paper presents a novel approach for imputing missing values in air quality data in a univariate approach. The algorithm employs the random forest (RF) algorithm to impute missing observations in a bi-directional (forward and reverse in time) manner for air quality (particulate matter less than 2.5 μm (PM 2.5 )) data from the Republic of Serbia. The algorithm was evaluated against simple methods, such as the mean and median imputation methods, for missing observations over durations of 24, 48, and 72 h. The results indicate that our algorithm yielded comparable error rates to the median imputation method for all periods when imputing the PM 2.5 data. Ultimately, the algorithm’s higher computational complexity proved itself as not justified considering the minimal error decrease it achieved compared with the simpler methods. However, for future improvement, additional research is needed, such as utilizing low-code machine learning libraries and time-series forecasting techniques.

Keywords: data imputation; air quality; PM 2.5; air pollution; missing observations; machine learning (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2071-1050/16/17/7629/pdf (application/pdf)
https://www.mdpi.com/2071-1050/16/17/7629/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:16:y:2024:i:17:p:7629-:d:1470256

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().