EconPapers    
Economics at your fingertips  
 

Missing Categorical Data Imputation and Individual Observation Level Imputation

Pavel Zimmermann, Petr Mazouch and Klára Hulíková Tesárková
Additional contact information
Pavel Zimmermann: Department of Statistics and Probability, Faculty of Informatics and Statistics, University of Economics, nám. W. Churchilla 4, 130 67 Prague 3, Czech Republic
Petr Mazouch: Department of Statistics and Probability, Faculty of Informatics and Statistics, University of Economics, nám. W. Churchilla 4, 130 67 Prague 3, Czech Republic
Klára Hulíková Tesárková: Department of Demography and Geodemography, Faculty of Science, Charles University in Prague, Albertov 6, 128 00 Prague 2, Czech Republic

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, 2014, vol. 62, issue 6, 1527-1534

Abstract: Traditional missing data techniques of imputation schemes focus on prediction of the missing value based on other observed values. In the case of continuous missing data the imputation of missing values often focuses on regression models. In the case of categorical data, usual techniques are then focused on classification techniques which sets the missing value to the 'most likely' category. This however leads to overrepresentation of the categories which are in general observed more often and hence can lead to biased results in many tasks especially in the case of presence of dominant categories. We present original methodology of imputation of missing values which results in the most likely structure (distribution) of the missing data conditional on the observed values. The methodology is based on the assumption that the categorical variable containing the missing values has multinomial distribution. Values of the parameters of this distribution are than estimated using the multinomial logistic regression. Illustrative example of missing value and its reconstruction of the highest education level of persons in some population is described.

Keywords: missing data; categorical data; multinomial regression (search for similar items in EconPapers)
Date: 2014
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://acta.mendelu.cz/doi/10.11118/actaun201462061527.html (text/html)
http://acta.mendelu.cz/doi/10.11118/actaun201462061527.pdf (application/pdf)
free of charge

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:mup:actaun:actaun_2014062061527

DOI: 10.11118/actaun201462061527

Access Statistics for this article

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis is currently edited by Markéta Havlásková

More articles in Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis from Mendel University Press
Bibliographic data for series maintained by Ivo Andrle ().

 
Page updated 2025-03-19
Handle: RePEc:mup:actaun:actaun_2014062061527