Comparing Price Indices of Clothing and Footwear for Scanner Data and Web Scraped Data
Antonio G. Chessa and
Robert Griffioen
Economie et Statistique / Economics and Statistics, 2019, issue 509, 49-68
Abstract:
[eng] Statistical institutes are considering web scraping of online prices of consumer goods as a feasible alternative to scanner data. The lack of transaction data generates the question whether web scraped data are suited for price index calculation. This article investigates this question by comparing price indices based on web scraped and scanner data for clothing and footwear in the same webshop. Scanner data and web scraped prices are often equal, with the latter being slightly higher on average. Numbers of web scraped product prices and products sold show remarkably high correlations. Given the high churn rates of clothing products, a multilateral method (Geary-Khamis) was used to calculate price indices. For 16 product categories, the indices show small overall differences between the two data sources, with year on year indices differing only by 0.3 percentage point at COICOP level (men’s and women's clothing). It remains to be investigated whether such promising results for web scraped data will also be found for other retailers.
JEL-codes: C43 E31 (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
https://www.insee.fr/en/statistiques/fichier/4203548/509_Chessa-Griffioen-EN.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nse:ecosta:ecostat_2019_509_4
DOI: 10.24187/ecostat.2019.509.1984
Access Statistics for this article
Economie et Statistique / Economics and Statistics is currently edited by Dominique Goux
More articles in Economie et Statistique / Economics and Statistics from Institut National de la Statistique et des Etudes Economiques (INSEE) Contact information at EDIRC.
Bibliographic data for series maintained by Veronique Egloff ().