Machine learning for phonological analysis: A case study in gender prediction
Yasser A. Al Tamimi () and
Lotfi Tadj ()
Edelweiss Applied Science and Technology, 2024, vol. 8, issue 6, 6480-6497
Abstract:
Al Tamimi and Smith (2023) use a conventional phonological framework to investigate gender differentiation in a corpus of 656 Saudi Arabian first names. Their findings suggest that no single phonological feature—such as the number of phonemes, syllable structure (open vs. closed), stress patterns, or the voicing of initial and final consonants—can definitively determine gender. However, a combination of these features can collectively facilitate accurate gender identification. Expanding on this premise, the current study integrates phonological analysis with machine learning, employing both supervised techniques (e.g., Naïve Bayes) and unsupervised methods (e.g., k-Means Clustering) to explore whether machine learning can effectively predict gender based on these phonological characteristics. Specifically, this study compares the performance of classification methods—Gradient Boosting Machine (GBM), Random Forest, and k-Nearest Neighbors (k-NN)—against clustering methods, including hierarchical clustering and DBSCAN. The methodology involves a detailed analysis of model performance metrics, such as accuracy, F1 scores, and clustering indices, to comprehensively evaluate the accuracy and effectiveness of each approach in gender classification. The results indicate that classification methods significantly outperform clustering approaches, with the GBM model demonstrating particularly high accuracy and balanced performance across genders. In contrast, clustering methods struggled, particularly in classifying male names, due to their reliance on similarity-based grouping rather than explicit class labeling. These findings suggest that while clustering methods may be helpful to for exploratory data analysis, they are inadequate for precise gender classification. The study's implications highlight the critical importance of selecting appropriate methodologies for classification tasks, demonstrating the superiority of classification models in gender prediction.
Keywords: Gender identification; K-means clustering; Machine learning; Naive bayes; Phonological features. (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
https://learning-gate.com/index.php/2576-8484/article/view/3402/1278 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ajp:edwast:v:8:y:2024:i:6:p:6480-6497:id:3402
Access Statistics for this article
More articles in Edelweiss Applied Science and Technology from Learning Gate
Bibliographic data for series maintained by Melissa Fernandes ().