Benchmarking Machine Learning Models for ESG Prediction in South Korea Using News-Derived Time Series
Kim Yunwoo and
Junhyuk Hwang
No v2738_v1, SocArXiv from Center for Open Science
Abstract:
Existing ESG ratings have limitations like disclosure delays, inconsistencies, and uneven coverage, particularly in non-English markets. This paper addresses these issues by establishing the first machine learning benchmark for ESG prediction in the Korean market using news-derived time-series features. A standardized dataset of 278 Korean firms was constructed, and monthly sentiment and ESG-relevance features were generated from news using Korean-specific language models. A mask-aware CNN explicitly handles missing data by distinguishing observed months from imputed ones. The model achieved a Mean Absolute Error (MAE) of 17.9, a Root Mean Squared Error (RMSE) of 22.0, an 𝑅2 of 0.12, and a Spearman’s 𝜌 of 0.38, demonstrating that temporal modeling and explicit handling of missing data are crucial for improving predictive accuracy.
Date: 2025-09-12
New Economics Papers: this item is included in nep-cmp
References: Add references at CitEc
Citations:
Downloads: (external link)
https://osf.io/download/68c3c1a9e33eca3b0feff8de/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:v2738_v1
DOI: 10.31219/osf.io/v2738_v1
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().