Improving Air Quality Data Reliability through Bi-Directional Univariate Imputation with the Random Forest Algorithm
Filip Arnaut (),
Vladimir Đurđević,
Aleksandra Kolarski,
Vladimir A. Srećković and
Sreten Jevremović
Additional contact information
Filip Arnaut: Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia
Vladimir Đurđević: Faculty of Physics, University of Belgrade, Cara Dušana 13, 11000 Belgrade, Serbia
Aleksandra Kolarski: Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia
Vladimir A. Srećković: Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia
Sreten Jevremović: Scientific Society “Isaac Newton”, Volgina 7, 11160 Belgrade, Serbia
Sustainability, 2024, vol. 16, issue 17, 1-17
Abstract:
Forecasting the future levels of air pollution provides valuable information that holds importance for the general public, vulnerable populations, and policymakers. High-quality data are essential for precise and reliable forecasts and investigations of air pollution. Missing observations arise when the sensors utilized for assessing air quality parameters experience malfunctions, which result in erroneous measurements or gaps in the dataset and hinder the data quality. This research paper presents a novel approach for imputing missing values in air quality data in a univariate approach. The algorithm employs the random forest (RF) algorithm to impute missing observations in a bi-directional (forward and reverse in time) manner for air quality (particulate matter less than 2.5 μm (PM 2.5 )) data from the Republic of Serbia. The algorithm was evaluated against simple methods, such as the mean and median imputation methods, for missing observations over durations of 24, 48, and 72 h. The results indicate that our algorithm yielded comparable error rates to the median imputation method for all periods when imputing the PM 2.5 data. Ultimately, the algorithm’s higher computational complexity proved itself as not justified considering the minimal error decrease it achieved compared with the simpler methods. However, for future improvement, additional research is needed, such as utilizing low-code machine learning libraries and time-series forecasting techniques.
Keywords: data imputation; air quality; PM 2.5; air pollution; missing observations; machine learning (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.mdpi.com/2071-1050/16/17/7629/pdf (application/pdf)
https://www.mdpi.com/2071-1050/16/17/7629/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:16:y:2024:i:17:p:7629-:d:1470256
Access Statistics for this article
Sustainability is currently edited by Ms. Alexandra Wu
More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().