Estimation bias due to duplicated observations: a Monte Carlo simulation
Francesco Sarracino and
Malgorzata Mikucka
MPRA Paper from University Library of Munich, Germany
Abstract:
This paper assesses how duplicate records affect the results from regression analysis of survey data, and it compares the effectiveness of five solutions to minimize the risk of obtaining biased estimates. Results show that duplicate records create considerable risk of obtaining biased estimates. The chances of obtaining unbiased estimates in presence of a single sextuplet of identical observations is 41.6%. If the dataset contains about 10% of duplicated observations, then the probability of obtaining unbiased estimates reduces to nearly 11%. Weighting the duplicate cases by the inversion of their multiplicity minimizes the bias when multiple doublets are present in the data. Our results demonstrate the risks of using data in presence of non-unique observations and call for further research on strategies to analyze affected data.
Keywords: duplicated observations; estimation bias; Monte Carlo simulation; inference (search for similar items in EconPapers)
JEL-codes: C13 C18 C21 C81 (search for similar items in EconPapers)
Date: 2016-01-26
New Economics Papers: this item is included in nep-ecm and nep-ore
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://mpra.ub.uni-muenchen.de/69064/1/MPRA_paper_69064.pdf original version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:pra:mprapa:69064
Access Statistics for this paper
More papers in MPRA Paper from University Library of Munich, Germany Ludwigstraße 33, D-80539 Munich, Germany. Contact information at EDIRC.
Bibliographic data for series maintained by Joachim Winter ().