A new multivariate imputation method based on Bayesian networks

Niloofar, P.; Ganjali, M.

A new multivariate imputation method based on Bayesian networks

P. Niloofar and M. Ganjali

Journal of Applied Statistics, 2014, vol. 41, issue 3, 501-518

Abstract: Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [ Bayesian networks for imputation , J. R. Stat. Soc. Ser. A 167 (2004), 309--322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [ Multivariate techniques for imputation based on Bayesian networks , Neural Netw. World 15 (2005), 303--310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates.

Date: 2014
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://hdl.handle.net/10.1080/02664763.2013.842960 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:41:y:2014:i:3:p:501-518

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20

DOI: 10.1080/02664763.2013.842960

Access Statistics for this article

Journal of Applied Statistics is currently edited by Robert Aykroyd

More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().