Generative Multimodal Models for Social Science: An Application with Satellite and Streetscape Imagery
Tina Law and
Elizabeth Roberto
Sociological Methods & Research, 2025, vol. 54, issue 3, 889-932
Abstract:
Although there is growing social science research examining how generative AI models can be effectively and systematically applied to text-based tasks, whether and how these models can be used to analyze images remain open questions. In this article, we introduce a framework for analyzing images with generative multimodal models, which consists of three core tasks: curation, discovery, and measurement and inference. We demonstrate this framework with an empirical application that uses OpenAI's GPT-4o model to analyze satellite and streetscape images ( n  = 1,101) to identify built environment features that contribute to contemporary residential segregation in U.S. cities. We find that when GPT-4o is provided with well-defined image labels, the model labels images with high validity compared to expert labels. We conclude with thoughts for other use cases and discuss how social scientists can work collaboratively to ensure that image analysis with generative multimodal models is rigorous, reproducible, ethical, and sustainable.
Keywords: artificial intelligence; generative multimodal models; image as data; computer vision; prompt development (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/00491241251339673 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:sae:somere:v:54:y:2025:i:3:p:889-932
DOI: 10.1177/00491241251339673
Access Statistics for this article
More articles in Sociological Methods & Research
Bibliographic data for series maintained by SAGE Publications ().