EconPapers    
Economics at your fingertips  
 

The Technological Bridge: R Programming’s Utility in Converting Social Media Data for Quantitative Financial Analysis

Litvinenko Alexey (), Samuli Saarinen () and Litvinenko Anna ()
Additional contact information
Litvinenko Alexey: University of Tartu, School of Economics and Business Administration, Tartu, Estonia
Samuli Saarinen: Estonian Business School, Tallinn, Estonia
Litvinenko Anna: Tallinn University of Technology, Department of Business Administration, Tallinn, Estonia

Economics and Culture, 2025, vol. 22, issue 1, 70-80

Abstract: Research purpose. This study explores whether R programming can transform unstructured qualitative social media data into a quantitative format suitable for econometric modelling. It specifically examines how elements such as text, emojis, and sentiment from Reddit and X (formerly Twitter) can be converted into variables for regression analysis. With the aim to enhance the predictive power of traditional financial models using alternative data sources, the paper outlines comprehensive guidelines with specific technical steps, from scripting an API to extracting data from Reddit and X, through cleaning and tokenising to incorporating the data into regression models using R programming. The study addresses the growing need in financial economics to incorporate alternative data streams by offering a structured, replicable process for transforming high-volume, unstructured online content into statistically valid variables, thereby bridging the gap between qualitative market sentiment and quantitative modelling. Design / Methodology / Approach. Focusing on the methodology and R scripts, this research adopts a quantitative approach, transforming qualitative social media data into a format suitable for multiple linear and instrumental variable regression models to assess the effect of social media signals on asset prices, with GameStop (GME) and Best Buy (BBY) as case studies. The process ensures reproducibility and includes open-source code, enhancing transparency and applicability for both academic and professional financial data analysis contexts. Findings. The findings demonstrate that qualitative social media data can be quantified for financial analysis. It was effectively extracted, cleaned, and used for regression analysis. Results show that traditional market indicators fail to explain GME’s price shifts, while the frequency of rocket emojis (interpreted as speculative sentiment) was statistically significant. BBY’s returns, however, aligned more closely with market and industry indices, suggesting a lower influence of private sentiment. Originality / Value / Practical implications. The research provides a replicable method for integrating social media data into econometric models, contributing new tools for analysing market sentiment. By adapting classical financial models to modern data sources, the paper opens new directions for asset pricing research. The paper provides technical tools created in R for use in econometric analysis, useful both for academics and practitioners.

Keywords: R programming; econometric analysis; CAPM; price non-synchronization; social media data (search for similar items in EconPapers)
JEL-codes: C58 C87 G14 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.2478/jec-2025-0006 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:vrs:ecocul:v:22:y:2025:i:1:p:70-80:n:1006

DOI: 10.2478/jec-2025-0006

Access Statistics for this article

Economics and Culture is currently edited by Velga Vēvere

More articles in Economics and Culture from Sciendo
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-07-01
Handle: RePEc:vrs:ecocul:v:22:y:2025:i:1:p:70-80:n:1006