Multidimensional scaling informed by F-statistic: Visualizing grouped microbiome data with inference

Kim, Hyungseok; Kim, Soobin; Kimbrel, Jeffrey A; Morris, Megan M; Mayali, Xavier; Buie, Cullen R

Multidimensional scaling informed by F-statistic: Visualizing grouped microbiome data with inference

Hyungseok Kim, Soobin Kim, Jeffrey A Kimbrel, Megan M Morris, Xavier Mayali and Cullen R Buie

PLOS Computational Biology, 2026, vol. 22, issue 4, 1-22

Abstract: Multidimensional scaling (MDS) is a widely used dimensionality reduction technique in microbial ecology data analysis that captures the multivariate structure of the data while preserving pairwise distances between samples. While improvements in MDS have enhanced the ability to reveal group-specific data patterns, these MDS-based methods require prior assumptions for inference, limiting their application in general microbiome analysis. In this study, we introduce a new MDS-based ordination method, “F-informed MDS,” which configures the data distribution based on the F-statistic, the ratio of dispersion between groups sharing common and different characteristics. Using semisynthetic datasets, we demonstrate that the proposed method is robust to hyperparameter selection while maintaining statistical significance throughout the ordination process. Various quality metrics for evaluating dimensionality reduction confirm that F-informed MDS is comparable to state-of-the-art methods in preserving both local and global data structures. Its application to a diatom-associated bacterial community suggests the role of this new method in interpreting the community’s response to the host. Our approach offers a well-founded refinement of MDS that aligns with statistical test results, which can be beneficial for broader multidimensional data analyses in microbiology and ecology. This new visualization tool can be incorporated into standard microbiome data analyses.Author summary: Multidimensional scaling (MDS), also known as principal coordinate analysis, is a fundamental step in exploratory data analysis for interpreting microbial community samples processed via high-throughput sequencing. The interpretation of MDS results often involves linking patterns obtained from MDS with experimental treatments applied to the samples, such as environmental conditions or host phenotypes. However, retaining these patterns during ordination is not always guaranteed, as MDS itself does not consider group information during its learning process. This limitation reduces the effectiveness of conventional MDS, particularly for general microbiome datasets, where maintaining meaningful biological patterns is crucial. To address this gap, we present a robust statistical framework designed to represent microbiome datasets in a lower dimension, while preserving hypothesis testing results for group differences in the original dimension. Our approach, which relies on sample dispersion measured by the F-statistic, ensures a more stable and reliable performance compared to existing ordination methods. By incorporating statistical rigor into the ordination process, our framework improves the visualization of microbial community data and allows configurations to be adjusted within reasonable limits. This advancement provides researchers with a more effective tool for analyzing and interpreting complex microbiome data, ultimately leading to insightful conclusions.

Date: 2026
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014102 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14102&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014102

DOI: 10.1371/journal.pcbi.1014102

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().