HMFGraph: Novel Bayesian approach for recovering biological networks
Aapo E Korhonen,
Olli Sarala,
Tuomas Hautamäki,
Markku Kuismin and
Mikko J Sillanpää
PLOS Computational Biology, 2025, vol. 21, issue 10, 1-27
Abstract:
Gaussian graphical models (GGM) are powerful tools to examine partial correlation structures in high-dimensional omics datasets. Partial correlation networks can explain complex relationships between genes or other biological variables. Bayesian implementations of GGMs have recently received more attention. Usually, the most demanding parts of GGM implementations are: (i) hyperparameter tuning, (ii) edge selection, (iii) scalability for large datasets, and (iv) the prior choice for Bayesian GGM.To address these limitations, we introduce a novel Bayesian GGM using a hierarchical matrix-F prior with a fast implementation. We show, with extensive simulations and biological example analyses, that this prior has competitive network recovery capabilities compared to state-of-the-art approaches and good properties for recovering meaningful networks. We present a new way of tuning the shrinkage hyperparameter by constraining the condition number of the estimated precision matrix. For edge selection, we propose using approximated credible intervals (CI) whose width is controlled by the false discovery rate. An optimal CI is selected by maximizing an estimated F1-score via permutations. In addition, a specific choice of hyperparameter can make the proposed prior better suited for clustering and community detection. Our method, with a generalized expectation-maximization algorithm, computationally outperforms existing Bayesian GGM approaches that use Markov chain Monte Carlo algorithms.The method is implemented in the R package HMFGraph, found on GitHub at https://github.com/AapoKorhonen/HMFGraph. All codes to reproduce the results are found on GitHub at https://github.com/AapoKorhonen/HMFGraph-Supplementary.Author summary: In this paper, we introduced a new way to recover network structures with biological datasets in mind. Network estimation methods can help research by providing a convenient and easy-to-understand way to explain multivariate data structures and variable interactions. This can show, for example, how different genes co-express. Here, we introduced a new model structure that has competitive network recovery capabilities compared to state-of-the-art methods in a wide range of simulation settings. In addition, we include examples with real datasets and explain how the interpretation changes with different model parameter values. Estimation is performed by using our fast algorithm, which has significant computational advances over conventional estimation methods. Our method has a user-friendly implementation as an R package and is publicly available for download on GitHub at https://github.com/AapoKorhonen/HMFGraph.
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013614 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13614&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013614
DOI: 10.1371/journal.pcbi.1013614
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().