Assessing and Validating the Ability of Machine Learning to Handle Unrefined Particle Air Pollution Mobile Monitoring Data Randomly, Spatially, and Spatiotemporally

Alazmi, Asmaa; Rakha, Hesham

Assessing and Validating the Ability of Machine Learning to Handle Unrefined Particle Air Pollution Mobile Monitoring Data Randomly, Spatially, and Spatiotemporally

Asmaa Alazmi () and Hesham Rakha
Additional contact information
Asmaa Alazmi: Department of Construction Project, Ministry of Public Work of Kuwait, Kuwait City 12011, Kuwait
Hesham Rakha: Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA

IJERPH, 2022, vol. 19, issue 16, 1-17

Abstract: Many epidemiological studies have evaluated the accuracy of machine learning models in predicting levels of particulate number (PN) and black carbon (BC) pollutant concentrations. However, few studies have investigated the ability of machine learning to predict the pollutant concentration with using unrefined mobile measurement data and explore the reliability of the prediction models. Additionally, researchers are moving away from using fixed-site data in favor of using mobile monitoring data in a variety of locations to develop hourly empirical models of particulate air pollution. This study compared the differences between long-term (daily average) and short-term (hourly average and 1 s unrefined data) model performance in three different classes of cross validation: randomly, spatially, and spatially temporally. This study used secondary data describing BC and PN pollutant levels in the rural location of Blacksburg (VA). Our results show that the model based on unrefined data was able to detect the pollutant hot spot areas with similar accuracy compared to the aggregated model. Moreover, the performance was found to improve when temporal data added to the model: the 10-fold MAE for the BC and PN were 0.44 μg/m 3 and 3391 pt/cm 3 , respectively, for the unrefined data (one second data) model. The findings detailed here will add to the literature on the correlation between data (pre)processing and the efficacy of machine learning models in predicting pollution levels while also enhancing our understanding of more reliable validation strategies.

Keywords: machine learning; land use regression; black carbon; particulate number; spatial and temporal variation; air pollution (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1660-4601/19/16/10098/pdf (application/pdf)
https://www.mdpi.com/1660-4601/19/16/10098/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:19:y:2022:i:16:p:10098-:d:888977

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().