EconPapers    
Economics at your fingertips  
 

Methodology for creating datasets of parallel sentences in low-resource languages by using AI

Balzhan Abduali (), Marek Milosz (), Ualsher Tukeyev () and Aidana Karibayeva ()

International Journal of Innovative Research and Scientific Studies, 2025, vol. 8, issue 9, 13-23

Abstract: This study addresses the crucial problem of data scarcity for low-resource languages, with a particular focus on a methodology for creating parallel corpora in two low-resource languages. The lack of large-scale, high-quality bilingual datasets significantly hinders the development of neural machine translation systems for such languages. This study proposes and validates a methodology for creating such datasets. The methodology involves selecting an AI system to generate a parallel corpus based on criteria of accessibility (free access), translation quality, and efficiency, based on a test dataset of 1000 sentences. Subsequently, a substantial parallel corpus of Kyrgyz-Kazakh was created using the selected AI system. However, manual error analysis revealed that approximately 0.5% of the translations contained inaccuracies, indicating the need for further post-editing and model refinement. This study contributes to the development of resources for low-resource language pairs and provides practical guidance on the effective creation of parallel corpora using modern AI systems.

Keywords: AI systems; Kazakh-Kyrgyz language pair; Low-resources languages; Methodology for creating datasets; Parallel sentences. (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://ijirss.com/index.php/ijirss/article/view/10605/2544 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:aac:ijirss:v:8:y:2025:i:9:p:13-23:id:10605

Access Statistics for this article

International Journal of Innovative Research and Scientific Studies is currently edited by Natalie Jean

More articles in International Journal of Innovative Research and Scientific Studies from Innovative Research Publishing
Bibliographic data for series maintained by Natalie Jean ().

 
Page updated 2025-10-11
Handle: RePEc:aac:ijirss:v:8:y:2025:i:9:p:13-23:id:10605