EconPapers    
Economics at your fingertips  
 

A review of Khmer word segmentation and part-of-speech tagging and an experimental study using bidirectional long short-term memory

Sreyteav Sry () and Amrudee Sukpan Nguyen
Additional contact information
Sreyteav Sry: Paragon International University, Phnom Penh, Cambodia
Amrudee Sukpan Nguyen: Computer Science Department, Paragon International University, Phnom Penh, Cambodia

HO CHI MINH CITY OPEN UNIVERSITY JOURNAL OF SCIENCE - ENGINEERING AND TECHNOLOGY, 2022, vol. 12, issue 1, 23-34

Abstract: Large contiguous blocks of unsegmented Khmer words can cause major problems for natural language processing applications such as machine translation, speech synthesis, information extraction, etc. Thus, word segmentation and part-of- speech tagging are two important prior tasks. Since the Khmer language does not always use explicit separators to split words, the definition of words is not a natural concept. Hence, tokenization and part-of-speech tagging of these languages are inseparable because the definition and principle of one task unavoidably affect the other. In this study, different approaches using in Khmer word segmentation and part-of-speech are reviewed and experimental study using a single long short-term memory network is described. Dataset from Asia Language Treebank is used to train and test the model. The preliminary experimental model achieved 95% accuracy rate. However, more testing to evaluate the model and compare it with different models is needed to conduct to select the more higher accuracy model.

Keywords: Word Segmentation; Part-of-speech tagging; Khmer Natural Language Processing; LSTM (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journalofscience.ou.edu.vn/index.php/tech-en/article/view/2219/1680 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bjw:techen:v:12:y:2022:i:1:p:23-34

DOI: 10.46223/HCMCOUJS.tech.en.12.1.2219.2022

Access Statistics for this article

HO CHI MINH CITY OPEN UNIVERSITY JOURNAL OF SCIENCE - ENGINEERING AND TECHNOLOGY is currently edited by Nguyen Thuan

More articles in HO CHI MINH CITY OPEN UNIVERSITY JOURNAL OF SCIENCE - ENGINEERING AND TECHNOLOGY from HO CHI MINH CITY OPEN UNIVERSITY JOURNAL OF SCIENCE, HO CHI MINH CITY OPEN UNIVERSITY
Bibliographic data for series maintained by Vu Tuan Truong ().

 
Page updated 2025-03-19
Handle: RePEc:bjw:techen:v:12:y:2022:i:1:p:23-34