Missing Data Imputation in the Internet of Things Sensor Networks
Benjamin Agbo,
Hussain Al-Aqrabi,
Richard Hill and
Tariq Alsboui
Additional contact information
Benjamin Agbo: Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK
Hussain Al-Aqrabi: Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK
Richard Hill: Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK
Tariq Alsboui: Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK
Future Internet, 2022, vol. 14, issue 5, 1-16
Abstract:
The Internet of Things (IoT) has had a tremendous impact on the evolution and adoption of information and communication technology. In the modern world, data are generated by individuals and collected automatically by physical objects that are fitted with electronics, sensors, and network connectivity. IoT sensor networks have become integral aspects of environmental monitoring systems. However, data collected from IoT sensor devices are usually incomplete due to various reasons such as sensor failures, drifts, network faults and various other operational issues. The presence of incomplete or missing values can substantially affect the calibration of on-field environmental sensors. The aim of this study is to identify efficient missing data imputation techniques that will ensure accurate calibration of sensors. To achieve this, we propose an efficient and robust imputation technique based on k -means clustering that is capable of selecting the best imputation technique for missing data imputation. We then evaluate the accuracy of our proposed technique against other techniques and test their effect on various calibration processes for data collected from on-field low-cost environmental sensors in urban air pollution monitoring stations. To test the efficiency of the imputation techniques, we simulated missing data rates at 10–40% and also considered missing values occurring over consecutive periods of time (1 day, 1 week and 1 month). Overall, our proposed BFMVI model recorded the best imputation accuracy (0.011758 RMSE for 10% missing data and 0.169418 RMSE at 40% missing data) compared to the other techniques ( k Nearest-Neighbour ( k NN), Regression Imputation (RI), Expectation Maximization (EM) and MissForest techniques) when evaluated using different performance indicators. Moreover, the results show a trade-off between imputation accuracy and computational complexity with benchmark techniques showing a low computational complexity at the expense of accuracy when compared with our proposed technique.
Keywords: IoT; low cost sensor; missing data; imputation (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1999-5903/14/5/143/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/5/143/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:5:p:143-:d:809518
Access Statistics for this article
Future Internet is currently edited by Ms. Grace You
More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().