Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives
Vicente Núñez-Antón,
Juan Manuel Pérez-Salamero González,
Marta Regúlez-Castillo and
Carlos Vidal-Melia
Additional contact information
Juan Manuel Pérez-Salamero González: Department of Financial Economics and Actuarial Science, Faculty of Economics, University of Valencia, Valencia. (Spain).
Marta Regúlez-Castillo: Department of Applied Economics III (Econometrics and Statistics), Faculty of Economics and Business, University of the Basque Country UPV/EHU, Bilbao. (Spain).
No 2019-20, Documentos de Trabajo del ICAE from Universidad Complutense de Madrid, Facultad de Ciencias Económicas y Empresariales, Instituto Complutense de Análisis Económico
Abstract:
This paper develops an optimization model for selecting a large subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is therefore NP-hard. However, the solution is found by maximizing the “constant of proportionality” – in other words, maximizing the size of the subsample taken from a stratified random sample with proportional allocation – and restricting it to a p-value high enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The beauty of the model is that it gives the user the freedom to choose between a larger subsample with a poorer fit and a smaller subsample with a better fit. The paper also applies the model to a real case: The Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records. Several waves (2005-2017) are first examined without using the model and the conclusion is that they are not representative of the target population, which in this case is people receiving a pension income. The model is then applied and the results prove that it is possible to obtain a large dataset from the CSWL that (far) better represents the pensioner population for each of the waves analysed.
Keywords: Optimization; Subsampling; Chi-square test; P-value, Continuous Sample of Working Lives. (search for similar items in EconPapers)
JEL-codes: C12 C61 C81 H55 J26 (search for similar items in EconPapers)
Pages: 30 pages
Date: 2019-03
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://eprints.ucm.es/id/eprint/55423/1/1920.pdf (application/pdf)
Related works:
Journal Article: Improving the Representativeness of a Simple Random Sample: An Optimization Model and Its Application to the Continuous Sample of Working Lives (2020) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ucm:doicae:1920
Ordering information: This working paper can be ordered from
Facultad de Ciencias Económicas y Empresariales. Pabellón prefabricado, 1ª Planta, ala norte. Campus de Somosaguas, 28223 - POZUELO DE ALARCÓN (MADRID)
https://www.ucm.es/f ... -de-trabajo-del-icae
Access Statistics for this paper
More papers in Documentos de Trabajo del ICAE from Universidad Complutense de Madrid, Facultad de Ciencias Económicas y Empresariales, Instituto Complutense de Análisis Económico Contact information at EDIRC.
Bibliographic data for series maintained by Águeda González Abad ().