EconPapers    
Economics at your fingertips  
 

How to prepare data for the automatic classification of politically related beliefs expressed on Twitter? The consequences of researchers’ decisions on the number of coders, the algorithm learning procedure, and the pre-processing steps on the performance of supervised models

Paweł Matuszewski ()
Additional contact information
Paweł Matuszewski: Collegium Civitas

Quality & Quantity: International Journal of Methodology, 2023, vol. 57, issue 1, No 15, 321 pages

Abstract: Abstract Due to the recent advances in natural language processing, social scientists use automatic text classification methods more and more frequently. The article raises the question about how researchers’ subjective decisions affect the performance of supervised deep learning models. The aim is to deliver practical advice for researchers concerning: (1) whether it is more efficient to monitor coders’ work to ensure a high quality training dataset or have every document coded once and obtain a larger dataset instead; (2) whether lemmatisation improves model performance; (3) if it is better to apply passive learning or active learning approaches; and (4) if the answers are dependent on the models’ classification tasks. The models were trained to detect if a tweet is about current affairs or political issues, the tweet’s subject matter and the tweet author’s stance on this. The study uses a sample of 200,000 manually coded tweets published by Polish political opinion leaders in 2019. The consequences of decisions under different conditions were checked by simulating 52,800 results using the fastText algorithm (DV: F1-score). Linear regression analysis suggests that the researchers’ choices not only strongly affect model performance but may also lead, in the worst-case scenario, to a waste of funds.

Keywords: Text classification; Natural language processing; Deep learning; Content analysis; Big data (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11135-022-01372-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:qualqt:v:57:y:2023:i:1:d:10.1007_s11135-022-01372-2

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11135

DOI: 10.1007/s11135-022-01372-2

Access Statistics for this article

Quality & Quantity: International Journal of Methodology is currently edited by Vittorio Capecchi

More articles in Quality & Quantity: International Journal of Methodology from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:qualqt:v:57:y:2023:i:1:d:10.1007_s11135-022-01372-2