EconPapers    
Economics at your fingertips  
 

Applying Machine Learning to Detect Outliers in Alternative Data Sources. A universal methodology framework for scanner and web-scraped data sources

Xuxin Mao, Janine Boshoff, Garry Young and Hande Kucuk

No ESCOE-TR-12, Economic Statistics Centre of Excellence (ESCoE) Technical Reports from Economic Statistics Centre of Excellence (ESCoE)

Abstract: This research explores new ways of applying machine learning to detect outliers in alternative price data resources such as web-scraped data and scanner data sources. Based on text vectorisation and clustering methods, we build a universal methodology framework which identifies outliers in both data sources. We provide a unique way of conducting goods classification and outlier detection. Using Density based spatial clustering of applications with noise (DBSCAN), we can provide two layers of outlier detection for both scanner data and web-scraped data. For web-scraped data we provide a method to classify text information and identify clusters of products. The framework allows us to efficiently detect outliers and explore abnormal price changes that may be omitted by the current practices in line with the 2019 Consumer Prices Indices Manual 2019. Our methodology also provides a good foundation for building better measurement of consumer prices with standard time series data transformed from alternative data sources.

Keywords: consumer price index; machine learning; outlier detection; scanner data; text density based clustering; web-scraped data (search for similar items in EconPapers)
JEL-codes: C43 E31 (search for similar items in EconPapers)
Date: 2021-11
New Economics Papers: this item is included in nep-big, nep-cmp and nep-mac
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://escoe-website.s3.amazonaws.com/wp-content/ ... 5244/ESCoE-TR-12.pdf

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nsr:escoet:escoe-tr-12

Access Statistics for this paper

More papers in Economic Statistics Centre of Excellence (ESCoE) Technical Reports from Economic Statistics Centre of Excellence (ESCoE) King's College London Strand London WC2R 2LS. Contact information at EDIRC.
Bibliographic data for series maintained by ESCoE Centre Manager ().

 
Page updated 2025-03-31
Handle: RePEc:nsr:escoet:escoe-tr-12