A machine learning approach to modelling the spatial variations in the daily fine particulate matter (PM2.5) and nitrogen dioxide (NO2) of Shanghai, China

Song, Xin-Yi; Gao, Ya; Peng, Yubo; Huang, Sen; Liu, Chao; Peng, Zhong-Ren

A machine learning approach to modelling the spatial variations in the daily fine particulate matter (PM2.5) and nitrogen dioxide (NO2) of Shanghai, China

Xin-Yi Song, Ya Gao, Yubo Peng, Sen Huang, Chao Liu and Zhong-Ren Peng
Additional contact information
Xin-Yi Song: 12474Shanghai Jiao Tong University, China
Ya Gao: 12474Shanghai Jiao Tong University, China
Yubo Peng: 3463University of Florida, USA
Sen Huang: 5452University of Miami, USA
Chao Liu: 12476Tongji University, China

Environment and Planning B, 2021, vol. 48, issue 3, 467-483

Abstract: It is challenging to forecast high-resolution spatial-temporal patterns of intra-urban air pollution and identify impacting factors at the regional scale. Studies have attempted to capture features of air pollutants such as fine particulate matter (PM 2.5 ) and nitrogen dioxide (NO 2 ) using land use regression models, but this method overlooks the multi-collinearity of factors, non-linear correlations between factors and air pollutants, and it fails to perform well when processing daily data. However, machine learning is a feasible approach for establishing persuasive intra-urban air pollution daily variation models. In this article, random forest is utilised to establish intra-urban PM 2.5 and NO 2 spatial-temporal variation models and is compared to the traditional land use regression method. Taking the city of Shanghai, China as the case area, 36 station-measured daily records in two and a half years of PM 2.5 and NO 2 concentrations were collected. And over 80 different predictors associated with meteorological and geographical conditions, transportation, community population density, land use and points of interest are used to construct the land use regression and random forest models. Results from the two methods are compared and impacting factors identified. Explained variance ( R 2 ) is used to quantify and compare model performance. The final land use regression model explains 49.3% and 42.2% of the spatial variation in ambient PM 2.5 and NO 2 , respectively, whereas the random forest model explains 78.1% and 60.5% of the variance. Regression mappings for unsampled sites on a grid pattern of 1â€‰kmâ€‰Ã—â€‰1â€‰km are also implemented. The random forest model is shown to perform much better than the land use regression model. In general, the findings suggest that the random forest approach offers a robust improvement in predicting performance compared to the land use regression model in estimating daily spatial variations in ambient PM 2.5 and NO 2 .

Keywords: Land use regression (LUR); machine learning; random forest; intra-urban air pollution; PM2.5; NO2; spatial-temporal (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/2399808320975031 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:sae:envirb:v:48:y:2021:i:3:p:467-483

DOI: 10.1177/2399808320975031

Access Statistics for this article

More articles in Environment and Planning B
Bibliographic data for series maintained by SAGE Publications ().