Unsupervised reward engineering for reinforcement learning controlled manufacturing
Thomas Hirtz (),
He Tian (),
Yi Yang and
Tian-Ling Ren
Additional contact information
Thomas Hirtz: Tsinghua University
He Tian: Tsinghua University
Yi Yang: Tsinghua University
Tian-Ling Ren: Tsinghua University
Journal of Intelligent Manufacturing, 2025, vol. 36, issue 8, No 30, 5875-5888
Abstract:
Abstract Reward engineering is a key challenge in reinforcement learning (RL) that can significantly affect the performance and applicability of RL algorithms. In the field of manufacturing, shaping the reward function for RL algorithms can be particularly difficult due to the complex and multi-objective nature of the manufacturing process. To address these challenges, we propose unsupervised reward engineering method based on a variational autoencoder (VAE) that uses the latent representation of the product for computing the environment’s reward. Our approach optimizes the underlying distribution of the fabricated product directly by leveraging the latent space distance or divergence between the manufactured and ideal products. This strategy circumvents issues commonly associated with conventional reward engineering, such as misaligned and hacked rewards. Our technique enables convenient multi-objective optimization and reward value bounding. Through a $$\beta $$ β -VAE architecture, we can adjust the weight of the Kullback–Leibler divergence term, prioritizing ideal characteristics or latent distribution based on the desired outcome. Applying our approach to semiconductor manufacturing, we demonstrate its benefits, including effective multi-objective optimization, stable reward, and meaningful data representations. Our method shows promise for optimizing complex manufacturing processes with RL and can be extended to various manufacturing-related fields. It can enhance product quality and offers opportunities for cross-facility manufacturing matching.
Keywords: Artificial intelligence; Semiconductor manufacturing; Reinforcement learning; Reward engineering (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10845-024-02491-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:joinma:v:36:y:2025:i:8:d:10.1007_s10845-024-02491-3
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10845
DOI: 10.1007/s10845-024-02491-3
Access Statistics for this article
Journal of Intelligent Manufacturing is currently edited by Andrew Kusiak
More articles in Journal of Intelligent Manufacturing from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().