EconPapers    
Economics at your fingertips  
 

A Context-Preserving Tokenization Mismatch Resolution Method for Korean Word Sense Disambiguation Based on the Sejong Corpus and BERT

Hanjo Jeong ()
Additional contact information
Hanjo Jeong: Department of Software Convergence Engineering, Mokpo National University, Muan 58554, Republic of Korea

Mathematics, 2025, vol. 13, issue 5, 1-14

Abstract: The disambiguation of word senses (Word Sense Disambiguation, WSD) plays a crucial role in various natural language processing (NLP) tasks, such as machine translation, sentiment analysis, and information retrieval. Due to the complex morphological structure and polysemy of the Korean language, the meaning of words can change depending on the context, making the WSD problem challenging. Since a single word can have multiple meanings, accurately distinguishing between them is essential for improving the performance of NLP models. Recently, large-scale pre-trained models like BERT and GPT, based on transfer learning, have shown promising results in addressing this issue. However, for languages with complex morphological structures, like Korean, the tokenization mismatch between pre-trained models and fine-tuning data prevents the rich contextual and lexical information learned by the pre-trained models from being fully utilized in downstream tasks. This paper proposes a novel method to address the tokenization mismatch issue during the fine-tuning of Korean WSD, leveraging BERT-based pre-trained models and the Sejong corpus, which has been annotated by language experts. Experimental results using various BERT-based pre-trained models and datasets from the Sejong corpus demonstrate that the proposed method improves performance by approximately 3–5% compared to existing approaches.

Keywords: word sense disambiguation; BERT; Sejong corpus; tokenization; transfer learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/5/864/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/5/864/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:5:p:864-:d:1605879

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-04-05
Handle: RePEc:gam:jmathe:v:13:y:2025:i:5:p:864-:d:1605879