EconPapers    
Economics at your fingertips  
 

Enhancing PM2.5 prediction by mitigating annual data drift using wrapped loss and neural networks

Md Khalid Hossen, Yan-Tsung Peng and Meng Chang Chen

PLOS ONE, 2025, vol. 20, issue 2, 1-25

Abstract: In many deep learning tasks, it is assumed that the data used in the training process is sampled from the same distribution. However, this may not be accurate for data collected from different contexts or during different periods. For instance, the temperatures in a city can vary from year to year due to various unclear reasons. In this paper, we utilized three distinct statistical techniques to analyze annual data drifting at various stations. These techniques calculate the P values for each station by comparing data from five years (2014-2018) to identify data drifting phenomena. To find out the data drifting scenario those statistical techniques and calculate the P value from those techniques to measure the data drifting in specific locations. From those statistical techniques, the highest drifting stations can be identified from the previous year’s datasets To identify data drifting and highlight areas with significant drift, we utilized meteorological air quality and weather data in this study. We proposed two models that consider the characteristics of data drifting for PM2.5 prediction and compared them with various deep learning models, such as Long Short-Term Memory (LSTM) and its variants, for predictions from the next hour to the 64th hour. Our proposed models significantly outperform traditional neural networks. Additionally, we introduced a wrapped loss function incorporated into a model, resulting in more accurate results compared to those using the original loss function alone and prediction has been evaluated by RMSE, MAE and MAPE metrics. The proposed Front-loaded connection model(FLC) and Back-loaded connection model (BLC) solve the data drifting issue and the wrap loss function also help alleviate the data drifting problem with model training and works for the neural network models to achieve more accurate results. Eventually, the experimental results have shown that the proposed model performance enhanced from 24.1% -16%, 12%-8.3% respectively at 1h-24h, 32h-64h with compared to baselines BILSTM model, by 24.6% -11.8%, 10%-10.2% respectively at 1h-24h, 32h-64h compared to CNN model in hourly PM2.5 predictions.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0314327 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 14327&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0314327

DOI: 10.1371/journal.pone.0314327

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-05-05
Handle: RePEc:plo:pone00:0314327