EconPapers    
Economics at your fingertips  
 

Artificial intelligence in linguistics: a GBRT model approach to forecast Cantonese levels among Chinese Malaysians

Yuqing Peng, Junxian Xie, Lin Zhang () and Yuwen Lyu ()
Additional contact information
Yuqing Peng: Guangzhou University
Junxian Xie: Guangzhou University
Lin Zhang: Tokyo Institute of Technology
Yuwen Lyu: Guangzhou Medical University

Humanities and Social Sciences Communications, 2025, vol. 12, issue 1, 1-8

Abstract: Abstract This study leverages a Gradient Boosted Regression Trees (GBRT) machine learning model to explore how Cantonese media exposure and cultural identity affect Cantonese language proficiency among Chinese Malaysians. By integrating sociolinguistic insights with predictive modeling, we address the multidimensional nature of language use factors. Using survey data from 642 Chinese Malaysian respondents, the GBRT model achieved a high predictive accuracy (R² ≈ 0.90) for Cantonese proficiency. The model identified key predictors, such as daily Cantonese use in social settings, media engagement, and generational cohort, underscoring their significant roles in language maintenance. These findings demonstrate the potential of machine learning to advance sociolinguistic research and provide practical insights for preserving linguistic heritage in multicultural societies.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1057/s41599-025-05520-5 Abstract (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-05520-5

Ordering information: This journal article can be ordered from
https://www.nature.com/palcomms/about

DOI: 10.1057/s41599-025-05520-5

Access Statistics for this article

More articles in Humanities and Social Sciences Communications from Palgrave Macmillan
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-10-14
Handle: RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-05520-5