Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel

Li, Feifei; He, Anrui; Song, Yong; Shen, Chengzhe; Wang, Fenjia; Yuan, Tieheng; Zhang, Shiwei; Xu, Xiaoqing; Qiang, Yi; Liu, Chao; Liu, Pengfei; Zhao, Qiangguo

Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel

Feifei Li, Anrui He, Yong Song (), Chengzhe Shen, Fenjia Wang, Tieheng Yuan, Shiwei Zhang, Xiaoqing Xu, Yi Qiang, Chao Liu, Pengfei Liu and Qiangguo Zhao
Additional contact information
Feifei Li: University of Science and Technology Beijing
Anrui He: University of Science and Technology Beijing
Yong Song: University of Science and Technology Beijing
Chengzhe Shen: University of Science and Technology Beijing
Fenjia Wang: University of Science and Technology Beijing
Tieheng Yuan: University of Science and Technology Beijing
Shiwei Zhang: University of Science and Technology Beijing
Xiaoqing Xu: University of Science and Technology Beijing
Yi Qiang: China Academy of Machinery Science and Technology
Chao Liu: University of Science and Technology Beijing
Qiangguo Zhao: Shihezi Zhonghe New Material Co., Ltd

Journal of Intelligent Manufacturing, 2025, vol. 36, issue 2, No 12, 1003-1020

Abstract: Abstract The problem of imbalanced regression is widely prevalent in various intelligent manufacturing systems, significantly constraining the industrial application of machine learning models. Existing research has overlooked the impact of redundant data and has lost valuable information within unlabeled data, therefore, the effectiveness of the models is limited. To this end, we propose a novel model framework (sNN-ST, similarity-based nearest neighbor and Self-Training fusion) to address imbalanced regression in industrial big data. This approach comprises two main steps: first, we identify and remove redundant samples by analyzing the redundancy relationships among samples. Then, we perform pseudo-labeling on unlabeled data, selectively incorporating reliable and non-redundant samples into the labeled dataset. We validate the proposed method on two imbalanced regression datasets. Removing redundant data and effectively utilizing unlabeled data optimize the dataset's distribution and enhance its information entropy. Consequently, the processed dataset significantly improves the overall model performance. We used this model to conduct a Multi-Parameter Global Relative Sensitivity Analysis within a production system. This analysis optimized existing process parameters and improved product quality consistency. This research presents a promising approach to addressing imbalanced regression problems.

Keywords: Imbalanced regression; Process optimization; Sample pruning; Self-training; Semi-supervised learning (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10845-023-02275-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:joinma:v:36:y:2025:i:2:d:10.1007_s10845-023-02275-1

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10845

DOI: 10.1007/s10845-023-02275-1

Access Statistics for this article

Journal of Intelligent Manufacturing is currently edited by Andrew Kusiak

More articles in Journal of Intelligent Manufacturing from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().