EconPapers    
Economics at your fingertips  
 

Fast and interpretable quantification of biological shape heterogeneity via stratified Wasserstein kernel

Wenjun Zhao, Danica J Sutherland and Khanh Dao Duc

PLOS Computational Biology, 2026, vol. 22, issue 5, 1-19

Abstract: Modern imaging technologies produce vast collections of cellular and subcellular structures, calling for principled methods that enable shape comparison across individuals and populations. We introduce the stratified Wasserstein framework, which treats each shape as an unstructured point cloud and embeds it into Euclidean space via ranked local distance profiles. This embedding yields an isometry-invariant Euclidean distance and a positive-definite kernel for population analysis, with a consistent sample-based estimator that supports large datasets in near-quadratic time. By leveraging kernel methods, the framework enables statistically rigorous tasks such as nonparametric hypothesis testing, providing theoretical guarantees as well as interpretability. We demonstrate our framework’s applicability to large-scale biological datasets. Analyzing 2D cancer cell contours, we quantify population-level discrepancies and identify representative cells contributing most strongly to the observed differences. Using 3D volumes of cell envelope and nucleus, we reveal progression patterns that capture morphological changes across cell populations both at the level of individual shapes. These results establish a simple and principled tool for population-level biological shape analysis, with potential impact across diverse domains of computational imaging and data science.Author summary: Biological structures come in many shapes, from whole tissues to single cells, and even protein conformations. Modern imaging technologies now produce enormous collections of these shapes, giving us the opportunity to study how structure varies across conditions or evolves over time. However, it is still difficult to compare large numbers of complex shapes in a way that is both fast and interpretable. Many existing methods rely on hand-chosen features or landmarks, while others are too slow to apply to the large datasets now common in biology.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014254 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14254&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014254

DOI: 10.1371/journal.pcbi.1014254

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2026-05-10
Handle: RePEc:plo:pcbi00:1014254