Machine learning for phonological analysis: A case study in gender prediction

Tamimi, Yasser A. Al; Tadj, Lotfi

Machine learning for phonological analysis: A case study in gender prediction

Yasser A. Al Tamimi () and Lotfi Tadj ()

Edelweiss Applied Science and Technology, 2024, vol. 8, issue 6, 6480-6497

Abstract: Al Tamimi and Smith (2023) use a conventional phonological framework to investigate gender differentiation in a corpus of 656 Saudi Arabian first names. Their findings suggest that no single phonological feature—such as the number of phonemes, syllable structure (open vs. closed), stress patterns, or the voicing of initial and final consonants—can definitively determine gender. However, a combination of these features can collectively facilitate accurate gender identification. Expanding on this premise, the current study integrates phonological analysis with machine learning, employing both supervised techniques (e.g., Naïve Bayes) and unsupervised methods (e.g., k-Means Clustering) to explore whether machine learning can effectively predict gender based on these phonological characteristics. Specifically, this study compares the performance of classification methods—Gradient Boosting Machine (GBM), Random Forest, and k-Nearest Neighbors (k-NN)—against clustering methods, including hierarchical clustering and DBSCAN. The methodology involves a detailed analysis of model performance metrics, such as accuracy, F1 scores, and clustering indices, to comprehensively evaluate the accuracy and effectiveness of each approach in gender classification. The results indicate that classification methods significantly outperform clustering approaches, with the GBM model demonstrating particularly high accuracy and balanced performance across genders. In contrast, clustering methods struggled, particularly in classifying male names, due to their reliance on similarity-based grouping rather than explicit class labeling. These findings suggest that while clustering methods may be helpful to for exploratory data analysis, they are inadequate for precise gender classification. The study's implications highlight the critical importance of selecting appropriate methodologies for classification tasks, demonstrating the superiority of classification models in gender prediction.

Keywords: Gender identification; K-means clustering; Machine learning; Naive bayes; Phonological features. (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
https://learning-gate.com/index.php/2576-8484/article/view/3402/1278 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ajp:edwast:v:8:y:2024:i:6:p:6480-6497:id:3402

Access Statistics for this article

More articles in Edelweiss Applied Science and Technology from Learning Gate
Bibliographic data for series maintained by Melissa Fernandes ().