Research on the Construction and Realization of Data Pipeline in Machine Learning Regression Prediction
Hua Zhang,
Guoxun Zheng,
Jun Xu,
Xuekun Yao and
Naeem Jan
Mathematical Problems in Engineering, 2022, vol. 2022, 1-5
Abstract:
The data set used by machine learning usually contains missing value and text type data, and sometimes, it is necessary to combine the attributes in the data set. The data set must be cleaned and converted before the machine learning model can be generated. This is frequently a chain of events. The entire processing procedure will be time-consuming and inconvenient. This article examines the data pipeline and recommends that it be used to process all data. We carry out automation and use k-fold cross-validation to evaluate the performance of the model. Experiments demonstrate that it can lower the regression prediction model’s root mean square error and enhance prediction accuracy.
Date: 2022
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://downloads.hindawi.com/journals/mpe/2022/7924335.pdf (application/pdf)
http://downloads.hindawi.com/journals/mpe/2022/7924335.xml (application/xml)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:7924335
DOI: 10.1155/2022/7924335
Access Statistics for this article
More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().