Empirically calibrated simulations reveal the limits of phenotypic clustering algorithms for biodiversity assessment in data-scarce crops
Abdel Kader Naino Jika
PLOS ONE, 2025, vol. 20, issue 12, 1-14
Abstract:
Clustering algorithms are widely used for phenotypic characterization and germplasm management, particularly in data-scarce crops such as neglected and underutilized species (NUS) that lack genomic resources. However, their performance under biologically realistic conditions remains poorly understood. Standard clustering methods commonly applied in crop research often assume distinct, isotropic, and homogeneous clusters, assumptions rarely satisfied in real-world phenotypic datasets. We developed a flexible and empirically calibrated simulation framework, using phenotypic data from West African fonio (Digitaria exilis), to benchmark the performance of eleven clustering algorithms under both idealized and realistic scenarios. Our simulations integrated heterogeneous trait distributions (normal, gamma), strong inter-trait correlations (up to r = –0.84), heteroscedasticity, and moderate population structure (mean Pst = 0.16 ± 0.001, achieved through iterative calibration). Each scenario was replicated 100 times, with clustering accuracy evaluated using external (ARI, NMI) and internal (Silhouette, Davies–Bouldin) validation metrics under standardized conditions. The results revealed consistently poor algorithm performance under realistic conditions (e.g., ARI
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0329254 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 29254&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0329254
DOI: 10.1371/journal.pone.0329254
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().