Word Game Modeling Using Character-Level N-Gram and Statistics
Jamolbek Mattiev (),
Ulugbek Salaev and
Branko Kavsek
Additional contact information
Jamolbek Mattiev: Information Technologies Department, Urgench State University, Khamid Alimdjan 14, Urgench 220100, Uzbekistan
Ulugbek Salaev: Information Technologies Department, Urgench State University, Khamid Alimdjan 14, Urgench 220100, Uzbekistan
Branko Kavsek: Department of Information Sciences and Technologies, University of Primorska, Glagoljaška 8, 6000 Koper, Slovenia
Mathematics, 2023, vol. 11, issue 6, 1-15
Abstract:
Word games are one of the most essential factors of vocabulary learning and matching letters to form words for children aged 5–12. These games help children to improve letter and word recognition, memory-building, and vocabulary retention skills. Since Uzbek is a low-resource language, there has not been enough research into designing word games for the Uzbek language. In this paper, we develop two models for designing the cubic-letter game, also known as the matching-letter game, in the Uzbek language, consisting of a predefined number of cubes, with a letter on each side of each six-sided cube, and word cards to form words using a combination of the cubes. More precisely, we provide the opportunity to form as many words as possible from the dataset, while minimizing the number of cubes. The proposed methods were created using a combination of a character-level n-gram model and letter position frequency in words at the level of vowels and consonants. To perform the experiments, a novel dataset, consisting of 4.5 k 3–5 letter words, was created by filtering based on child age groups for the Uzbek language, and three more datasets were generated, based on the support of experts for the Russian, English, and Slovenian languages. Experimental evaluations showed that both models achieved good results in terms of average coverage. In particular, the Vowel Priority ( VL ) approach obtained reasonably high coverage with 95.9% in Uzbek, 96.8% in English, and 94.2% in the Slovenian language in the case of eight cubes, based on the five-fold cross-validation method. Both models covered around 85% of five letter words in Uzbek, English, and Slovenian datasets, while this coverage was even higher (99%) in three letter words in the case of eight cubes.
Keywords: word game modeling; letter frequency; character-level N-gram; model coverage; statistics (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/6/1380/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/6/1380/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:6:p:1380-:d:1095144
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().