EconPapers    
Economics at your fingertips  
 

Predicting the demographics of Twitter users with programmatic weak supervision

Jonathan Tonglet (), Astrid Jehoul (), Manon Reusens (), Michael Reusens () and Bart Baesens ()
Additional contact information
Jonathan Tonglet: KU Leuven
Astrid Jehoul: Datashift
Manon Reusens: KU Leuven
Michael Reusens: Statistics Flanders
Bart Baesens: KU Leuven

TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, 2024, vol. 32, issue 3, No 2, 354-390

Abstract: Abstract Predicting the demographics of Twitter users has become a problem with a large interest in computational social sciences. However, the limited amount of public datasets with ground truth labels and the tremendous costs of hand-labeling make this task particularly challenging. Recently, programmatic weak supervision has emerged as a new framework to train classifiers on noisy data with minimal human labeling effort. In this paper, demographic prediction is framed for the first time as a programmatic weak supervision problem. A new three-step methodology for gender, age category, and location prediction is provided, which outperforms traditional programmatic weak supervision and is competitive with the state-of-the-art deep learning model. The study is performed in Flanders, a small Dutch-speaking European region, characterized by a limited number of user profiles and tweets. An evaluation conducted on an independent hand-labeled test set shows that the proposed methodology can be generalized to unseen users within the geographic area of interest.

Keywords: Demographic prediction; Weak supervision; Data labeling; Twitter (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s11750-024-00666-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:topjnl:v:32:y:2024:i:3:d:10.1007_s11750-024-00666-y

Ordering information: This journal article can be ordered from
http://link.springer.de/orders.htm

DOI: 10.1007/s11750-024-00666-y

Access Statistics for this article

TOP: An Official Journal of the Spanish Society of Statistics and Operations Research is currently edited by Juan José Salazar González and Gustavo Bergantiños

More articles in TOP: An Official Journal of the Spanish Society of Statistics and Operations Research from Springer, Sociedad de Estadística e Investigación Operativa
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:topjnl:v:32:y:2024:i:3:d:10.1007_s11750-024-00666-y