EconPapers    
Economics at your fingertips  
 

Pride, Love, and Twitter Rants: Combining Machine Learning and Qualitative Techniques to Understand What Our Tweets Reveal about Race in the US

Thu T. Nguyen, Shaniece Criss, Amani M. Allen, M. Maria Glymour, Lynn Phan, Ryan Trevino, Shrikha Dasari and Quynh C. Nguyen
Additional contact information
Thu T. Nguyen: Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
Shaniece Criss: Department of Health Science, Furman University, Greenville, SC 29613, USA
Amani M. Allen: Divisions of Community Health Sciences and Epidemiology, University of California, Berkeley, CA 94704, USA
M. Maria Glymour: Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
Lynn Phan: Program of Public Health Science, University of Maryland School of Public Health, College Park, MD 20742, USA
Ryan Trevino: Department of Health Sciences, College of Science and Health, DePaul University, Chicago, IL 60614, USA
Shrikha Dasari: Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
Quynh C. Nguyen: Department of Epidemiology & Biostatistics, University of Maryland School of Public Health, College Park, MD 20742, USA

IJERPH, 2019, vol. 16, issue 10, 1-19

Abstract: Objective : Describe variation in sentiment of tweets using race-related terms and identify themes characterizing the social climate related to race. Methods : We applied a Stochastic Gradient Descent Classifier to conduct sentiment analysis of 1,249,653 US tweets using race-related terms from 2015–2016. To evaluate accuracy, manual labels were compared against computer labels for a random subset of 6600 tweets. We conducted qualitative content analysis on a random sample of 2100 tweets. Results : Agreement between computer labels and manual labels was 74%. Tweets referencing Middle Eastern groups (12.5%) or Blacks (13.8%) had the lowest positive sentiment compared to tweets referencing Asians (17.7%) and Hispanics (17.5%). Qualitative content analysis revealed most tweets were represented by the categories: negative sentiment (45%), positive sentiment such as pride in culture (25%), and navigating relationships (15%). While all tweets use one or more race-related terms, negative sentiment tweets which were not derogatory or whose central topic was not about race were common. Conclusion : This study harnesses relatively untapped social media data to develop a novel area-level measure of social context (sentiment scores) and highlights some of the challenges in doing this work. New approaches to measuring the social environment may enhance research on social context and health.

Keywords: social media; minority groups; discrimination; big data; content analysis (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1660-4601/16/10/1766/pdf (application/pdf)
https://www.mdpi.com/1660-4601/16/10/1766/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:16:y:2019:i:10:p:1766-:d:232352

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jijerp:v:16:y:2019:i:10:p:1766-:d:232352