Content Moderation in the Global South: A Comparative Study of Four Low-Resource Languages
Mona Elswah,
Aliya Bhatia and
Dhanaraj Thakur
No jbup9_v1, OSF Preprints from Center for Open Science
Abstract:
Over the past 18 months, the Center for Democracy and Technology (CDT) has been studying how content moderation systems operate across multiple regions in the Global South, with a focus on South Asia, North and East Africa, and South America. Our team studied four languages: the different Maghrebi Arabic Dialects (Elswah, 2024a), Kiswahili (Elswah, 2024b), Tamil (Bhatia & Elswah, 2025), and Quechua (Thakur, 2025). These languages and dialects are considered “low resource” due to the scarcity of training data available to develop equitable and accurate AI models for them. We did this through essential collaborations with regional civil society organizations in the Global South to help us understand the local dynamics of their digital environments. Content moderation remains an area that technology companies keep largely inaccessible to public scrutiny, except for the information they choose to disclose. Our findings significantly contribute to the scientific and policy communities’ understanding of content moderation and its challenges in the Global South. The data we present in this report also contributes to our understanding of the information environment in the Global South, which is understudied in current scholarship.
Date: 2025-06-28
References: Add references at CitEc
Citations:
Downloads: (external link)
https://osf.io/download/68a61a455bee4ec0098a8c46/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:osfxxx:jbup9_v1
DOI: 10.31219/osf.io/jbup9_v1
Access Statistics for this paper
More papers in OSF Preprints from Center for Open Science
Bibliographic data for series maintained by OSF ().