EconPapers    
Economics at your fingertips  
 

The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure

Yoonsang Kim, Rachel Nordgren and Sherry Emery
Additional contact information
Yoonsang Kim: Social Data Collaboratory, Public Health, NORC at the University of Chicago, Chicago, IL 60603, USA
Rachel Nordgren: Biostatistics, School of Public Health, University of Illinois at Chicago, Chicago, IL 60612, USA
Sherry Emery: Social Data Collaboratory, Public Health, NORC at the University of Chicago, Chicago, IL 60603, USA

IJERPH, 2020, vol. 17, issue 3, 1-15

Abstract: Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication. The three primary application programming interfaces (API) of Twitter data sources are Streaming, Search, and Firehose. To date, no clear guidance exists about the advantages and limitations of each API, or about the comparability of the amount, content, and user accounts of retrieved tweets from each API. Such information is crucial to the validity, interpretation, and replicability of research findings. This study examines whether tweets collected using the same search filters over the same time period, but calling different APIs, would retrieve comparable datasets. We collected tweets about anti-smoking, e-cigarettes, and tobacco using the aforementioned APIs. The retrieved tweets largely overlapped between three APIs, but each also retrieved unique tweets, and the extent of overlap varied over time and by topic, resulting in different trends and potentially supporting diverging inferences. Researchers need to understand how different data sources can influence both the amount, content, and user accounts of data they retrieve from social media, in order to assess the implications of their choice of data source.

Keywords: Twitter; social media data source; point of access; data quality; e-cigarette (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1660-4601/17/3/864/pdf (application/pdf)
https://www.mdpi.com/1660-4601/17/3/864/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:17:y:2020:i:3:p:864-:d:314348

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jijerp:v:17:y:2020:i:3:p:864-:d:314348