On the Use of Optimal Transportation Theory to Recode Variables and Application to Database Merging
Gares Valérie (),
Dimeglio Chloé (),
Guernec Grégory (),
Fantin Romain (),
Lepage Benoit (),
Kosorok Michael R. () and
Savy Nicolas ()
Additional contact information
Gares Valérie: University of Rennes, INSA, CNRS, IRMAR - UMR 6625, F-35000Rennes, France
Dimeglio Chloé: University of Toulouse III - INSERM, UMR 1027 - CHU Toulouse, Toulouse, France
Guernec Grégory: INSERM, UMR 1027, Toulouse, France
Fantin Romain: University of Toulouse III, Toulouse, France
Lepage Benoit: University of Toulouse III - INSERM, UMR 1027 - CHU Toulouse, Toulouse, France
Kosorok Michael R.: Department of Biostatistics, University of North Carolina at Chapel Hill - Chapel Hill, NC, USA
Savy Nicolas: Toulouse Institute of MathematicsUMR C5583, Toulouse, France
The International Journal of Biostatistics, 2020, vol. 16, issue 1, 16
Abstract:
Merging databases is a strategy of paramount interest especially in medical research. A common problem in this context comes from a variable which is not coded on the same scale in both databases we aim to merge. This paper considers the problem of finding a relevant way to recode the variable in order to merge these two databases. To address this issue, an algorithm, based on optimal transportation theory, is proposed. Optimal transportation theory gives us an application to map the measure associated with the variable in database A to the measure associated with the same variable in database B. To do so, a cost function has to be introduced and an allocation rule has to be defined. Such a function and such a rule is proposed involving the information contained in the covariates. In this paper, the method is compared to multiple imputation by chained equations and a statistical learning method and has demonstrated a better average accuracy in many situations. Applications on both simulated and real datasets show that the efficiency of the proposed merging algorithm depends on how the covariates are linked with the variable of interest.
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1515/ijb-2018-0106 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:ijbist:v:16:y:2020:i:1:p:16:n:8
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/ijb/html
DOI: 10.1515/ijb-2018-0106
Access Statistics for this article
The International Journal of Biostatistics is currently edited by Antoine Chambaz, Alan E. Hubbard and Mark J. van der Laan
More articles in The International Journal of Biostatistics from De Gruyter
Bibliographic data for series maintained by Peter Golla ().