Modeling the Bias of Digital Data: An Approach to Combining Digital With Official Statistics to Estimate and Predict Migration Trends
Yuan Hsiao,
Lee Fiorio,
Jonathan Wakefield and
Emilio Zagheni
Sociological Methods & Research, 2024, vol. 53, issue 4, 1905-1943
Abstract:
Obtaining reliable and timely estimates of migration flows is critical for advancing the migration theory and guiding policy decisions, but it remains a challenge. Digital data provide granular information on time and space, but do not draw from representative samples of the population, leading to biased estimates. We propose a method for combining digital data and official statistics by using the official statistics to model the spatial and temporal dependence structure of the biases of digital data. We use simulations to demonstrate the validity of the model, then empirically illustrate our approach by combining geo-located Twitter data with data from the American Community Survey (ACS) to estimate state-level out-migration probabilities in the United States. We show that our model, which combines unbiased and biased data, produces predictions that are more accurate than predictions based solely on unbiased data. Our approach demonstrates how digital data can be used to complement, rather than replace, official statistics.
Keywords: digital data; bias modeling; space-time models; migration; survey; Twitter; population processes (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/00491241221140144 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:sae:somere:v:53:y:2024:i:4:p:1905-1943
DOI: 10.1177/00491241221140144
Access Statistics for this article
More articles in Sociological Methods & Research
Bibliographic data for series maintained by SAGE Publications ().