EconPapers    
Economics at your fingertips  
 

Improving post-editing of Kazakh translations with fine-tuned large language models: Dataset and evaluation

Diana Rakhimova (), Aliya Zhiger (), Madina Mansurova (), Valentin Malykh () and XMagzhan Kairanbay ()

International Journal of Innovative Research and Scientific Studies, 2025, vol. 8, issue 8, 220-233

Abstract: Machine translation for low-resource languages like Kazakh faces significant challenges due to limited training data, complex morphology, and cultural-linguistic nuances. This paper presents the first comprehensive study on fine-tuning large language models for automated post-editing of Kazakh translations. We introduce KazPE, a systematically annotated dataset containing 10,010 training sentences and 315 test sentences across six domains (medical, scientific, journalistic, oral, fiction, and legal) with detailed error categorization covering 11 linguistic dimensions. Our approach fine-tunes GPT-4.1-mini using supervised learning to improve translation quality through targeted error correction. Human evaluation demonstrates that our fine-tuned model achieves a mean quality score of 0.84 compared to 0.80 for the baseline, representing a 4% relative improvement. The most significant gains occur in morphological-lexical error handling and domain-specific contexts, with legal and medical texts showing improvements of +2.8% and +1.6% respectively. Error analysis reveals that fine-tuning effectively addresses Kazakh’s agglutinative morphology and specialized terminology while maintaining performance on error-free sentences. This work establishes the first systematic evaluation framework for Kazakh translation post-editing, providing valuable insights for improving machine translation systems for morphologically rich, low-resource languages. Our dataset, models, and evaluation framework are made publicly available to support future research in Turkic language processing.

Keywords: Fine-tuning; Kazakh; Large language models; Low-resource languages; Machine translation; Morphologically rich languages; NLP; Post-editing. (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://ijirss.com/index.php/ijirss/article/view/10583/2529 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:aac:ijirss:v:8:y:2025:i:8:p:220-233:id:10583

Access Statistics for this article

International Journal of Innovative Research and Scientific Studies is currently edited by Natalie Jean

More articles in International Journal of Innovative Research and Scientific Studies from Innovative Research Publishing
Bibliographic data for series maintained by Natalie Jean ().

 
Page updated 2025-10-10
Handle: RePEc:aac:ijirss:v:8:y:2025:i:8:p:220-233:id:10583