EconPapers    
Economics at your fingertips  
 

Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives

Vicente Núñez-Antón, Juan Manuel Pérez-Salamero González, Marta Regúlez-Castillo and Carlos Vidal-Melia ()
Additional contact information
Juan Manuel Pérez-Salamero González: Department of Financial Economics and Actuarial Science, Faculty of Economics, University of Valencia, Valencia. (Spain).
Marta Regúlez-Castillo: Department of Applied Economics III (Econometrics and Statistics), Faculty of Economics and Business, University of the Basque Country UPV/EHU, Bilbao. (Spain).

No 2019-20, Documentos de Trabajo del ICAE from Universidad Complutense de Madrid, Facultad de Ciencias Económicas y Empresariales, Instituto Complutense de Análisis Económico

Abstract: This paper develops an optimization model for selecting a large subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is therefore NP-hard. However, the solution is found by maximizing the “constant of proportionality” – in other words, maximizing the size of the subsample taken from a stratified random sample with proportional allocation – and restricting it to a p-value high enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The beauty of the model is that it gives the user the freedom to choose between a larger subsample with a poorer fit and a smaller subsample with a better fit. The paper also applies the model to a real case: The Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records. Several waves (2005-2017) are first examined without using the model and the conclusion is that they are not representative of the target population, which in this case is people receiving a pension income. The model is then applied and the results prove that it is possible to obtain a large dataset from the CSWL that (far) better represents the pensioner population for each of the waves analysed.

Keywords: Optimization; Subsampling; Chi-square test; P-value, Continuous Sample of Working Lives. (search for similar items in EconPapers)
JEL-codes: C61 C81 C12 H55 J26 (search for similar items in EconPapers)
Pages: 30 pages
Date: 2019-03
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed

Downloads: (external link)
https://eprints.ucm.es/id/eprint/55423/1/1920.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ucm:doicae:1920

Ordering information: This working paper can be ordered from
Facultad de Ciencias Económicas y Empresariales. Pabellón prefabricado, 1ª Planta, ala norte. Campus de Somosaguas, 28223 - POZUELO DE ALARCÓN (MADRID)
https://www.ucm.es/f ... -de-trabajo-del-icae

Access Statistics for this paper

More papers in Documentos de Trabajo del ICAE from Universidad Complutense de Madrid, Facultad de Ciencias Económicas y Empresariales, Instituto Complutense de Análisis Económico Contact information at EDIRC.
Bibliographic data for series maintained by Águeda González Abad ().

 
Page updated 2021-06-14
Handle: RePEc:ucm:doicae:1920