EconPapers    
Economics at your fingertips  
 

Improving the Output Quality of Official Statistics Based on Machine Learning Algorithms

Meertens Q.A. (), Diks C.G.H. (), H.J. van den Herik () and Takes F.W. ()
Additional contact information
Meertens Q.A.: Statistics Netherlands, Henri Faasdreef 312, 2492 JP The Hague, the Netherlands.
Diks C.G.H.: University of Amsterdam, Center for Nonlinear Dynamics in Economics and Finance, Roetersstraat 11, 1018 WB Amsterdam, the Netherlands.
H.J. van den Herik: Leiden University, Niels Bohrweg 1, 2333 CA Leiden the Netherlands.
Takes F.W.: Leiden University, Niels Bohrweg 1, 2333 CA Leiden the Netherlands.

Journal of Official Statistics, 2022, vol. 38, issue 2, 485-508

Abstract: National statistical institutes currently investigate how to improve the output quality of official statistics based on machine learning algorithms. A key issue is concept drift, that is, when the joint distribution of independent variables and a dependent (categorical) variable changes over time. Under concept drift, a statistical model requires regular updating to prevent it from becoming biased. However, updating a model asks for additional data, which are not always available. An alternative is to reduce the bias by means of bias correction methods. In the article, we focus on estimating the proportion (base rate) of a category of interest and we compare two popular bias correction methods: the misclassification estimator and the calibration estimator. For prior probability shift (a specific type of concept drift), we investigate the two methods analytically as well as numerically. Our analytical results are expressions for the bias and variance of both methods. As numerical result, we present a decision boundary for the relative performance of the two methods. Our results provide a better understanding of the effect of prior probability shift on output quality. Consequently, we may recommend a novel approach on how to use machine learning algorithms in the context of official statistics.

Keywords: Output quality; concept drift; prior probability shift; misclassification bias (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.2478/jos-2022-0023 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:vrs:offsta:v:38:y:2022:i:2:p:485-508:n:8

DOI: 10.2478/jos-2022-0023

Access Statistics for this article

Journal of Official Statistics is currently edited by Annica Isaksson and Ingegerd Jansson

More articles in Journal of Official Statistics from Sciendo
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-04-12
Handle: RePEc:vrs:offsta:v:38:y:2022:i:2:p:485-508:n:8