Increasing reproducibility, robustness, and generalizability of biomarker selection from meta-analysis using Bayesian methodology

Kalesinskas, Laurynas; Gupta, Sanjana; Khatri, Purvesh

Increasing reproducibility, robustness, and generalizability of biomarker selection from meta-analysis using Bayesian methodology

Laurynas Kalesinskas, Sanjana Gupta and Purvesh Khatri

PLOS Computational Biology, 2022, vol. 18, issue 6, 1-15

Abstract: A major limitation of gene expression biomarker studies is that they are not reproducible as they simply do not generalize to larger, real-world, heterogeneous populations. Frequentist multi-cohort gene expression meta-analysis has been frequently used as a solution to this problem to identify biomarkers that are truly differentially expressed. However, the frequentist meta-analysis framework has its limitations–it needs at least 4–5 datasets with hundreds of samples, is prone to confounding from outliers and relies on multiple-hypothesis corrected p-values. To address these shortcomings, we have created a Bayesian meta-analysis framework for the analysis of gene expression data. Using real-world data from three different diseases, we show that the Bayesian method is more robust to outliers, creates more informative estimates of between-study heterogeneity, reduces the number of false positive and false negative biomarkers and selects more generalizable biomarkers with less data. We have compared the Bayesian framework to a previously published frequentist framework and have developed a publicly available R package for use.Author summary: There has long been a reproducibility crisis in medical research–driven by small, single-cohort studies with low-to-moderate statistical power. One of the reasons for this lack of generalizability is not accounting for heterogeneity representative of the real-world patient population. To address this issue, researchers have turned to meta-analysis–which allows for researchers to combine data from across previously published studies to generate an overall estimate of an effect, which has been used with gene expression data to create diagnostic and prognostic markers of disease. However, traditional meta-analysis techniques have limitations–they need at least 4–5 datasets with hundreds of samples and are prone to confounding from outliers in datasets. In this study, we create a new framework for gene expression meta-analysis using Bayesian statistics and show that it is more robust to outliers, creates more informative estimates of heterogeneity, reduces the amount of data required, and reduces the number of false positive and false negative biomarkers. We have compared the Bayesian framework to a previously published framework and have developed a publicly available R package for use.

Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010260 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 10260&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1010260

DOI: 10.1371/journal.pcbi.1010260

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().