Research on the Construction and Realization of Data Pipeline in Machine Learning Regression Prediction

Zhang, Hua; Zheng, Guoxun; Xu, Jun; Yao, Xuekun; Jan, Naeem

Research on the Construction and Realization of Data Pipeline in Machine Learning Regression Prediction

Hua Zhang, Guoxun Zheng, Jun Xu, Xuekun Yao and Naeem Jan

Mathematical Problems in Engineering, 2022, vol. 2022, 1-5

Abstract: The data set used by machine learning usually contains missing value and text type data, and sometimes, it is necessary to combine the attributes in the data set. The data set must be cleaned and converted before the machine learning model can be generated. This is frequently a chain of events. The entire processing procedure will be time-consuming and inconvenient. This article examines the data pipeline and recommends that it be used to process all data. We carry out automation and use k-fold cross-validation to evaluate the performance of the model. Experiments demonstrate that it can lower the regression prediction modelâ€™s root mean square error and enhance prediction accuracy.

Date: 2022
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://downloads.hindawi.com/journals/mpe/2022/7924335.pdf (application/pdf)
http://downloads.hindawi.com/journals/mpe/2022/7924335.xml (application/xml)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:7924335

DOI: 10.1155/2022/7924335

Access Statistics for this article

More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().