Artificial intelligence in linguistics: a GBRT model approach to forecast Cantonese levels among Chinese Malaysians
Yuqing Peng,
Junxian Xie,
Lin Zhang () and
Yuwen Lyu ()
Additional contact information
Yuqing Peng: Guangzhou University
Junxian Xie: Guangzhou University
Lin Zhang: Tokyo Institute of Technology
Yuwen Lyu: Guangzhou Medical University
Humanities and Social Sciences Communications, 2025, vol. 12, issue 1, 1-8
Abstract:
Abstract This study leverages a Gradient Boosted Regression Trees (GBRT) machine learning model to explore how Cantonese media exposure and cultural identity affect Cantonese language proficiency among Chinese Malaysians. By integrating sociolinguistic insights with predictive modeling, we address the multidimensional nature of language use factors. Using survey data from 642 Chinese Malaysian respondents, the GBRT model achieved a high predictive accuracy (R² ≈ 0.90) for Cantonese proficiency. The model identified key predictors, such as daily Cantonese use in social settings, media engagement, and generational cohort, underscoring their significant roles in language maintenance. These findings demonstrate the potential of machine learning to advance sociolinguistic research and provide practical insights for preserving linguistic heritage in multicultural societies.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1057/s41599-025-05520-5 Abstract (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-05520-5
Ordering information: This journal article can be ordered from
https://www.nature.com/palcomms/about
DOI: 10.1057/s41599-025-05520-5
Access Statistics for this article
More articles in Humanities and Social Sciences Communications from Palgrave Macmillan
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().