EconPapers    
Economics at your fingertips  
 

Fully synthetic replication of complex real biological cell clusters using a novel cluster-based ‘Rosetta-Routine’ computational modelling process

Bradley Mason, Laura Justham, Liam Whitby, Alison Whitby, Stuart Scott, Samuel Nti and Jon Petzing

PLOS Computational Biology, 2026, vol. 22, issue 5, 1-30

Abstract: Flow cytometry (FC) is essential for the precise quantification and characterisation of individual cell populations in a larger heterogenous cell suspension. FC analysis provides a foundation for advanced clinical diagnostics and is a key component in many life-saving therapeutic strategies across a broad range of medical conditions. However, clinical, industrial and research laboratories alike face significant challenges in validating the metrological and biological accuracy of FC data analysis. Due to the inherent relative nature of FC data and the lack of definitive ‘ground truth’ associated with processed biological samples. This study specifically focuses on generating realistic fully synthetic flow cytometry cell clusters and demonstrating their suitability as substitutes for traditional FC data. The inherent model-based heritage of synthetic data enables the robust ability to generate distributionally-equivalent replicate datasets with explicit knowledge of cluster membership for each individual datapoint. Thereby, reducing the uncertainty issues associated with real cluster data and its analysis. This research uses meticulously optimised synthetic cluster-generating benchmarking software to simulate real monocyte clusters. A central component of the protocol is the ‘Rosetta-Routine’, a novel codebase which deciphers the statistical properties of real data and translates them into the computational coefficients required to generate accurate cluster-based synthetic replicates. This innovative approach ensures that the synthetic datasets faithfully represent the statistical characteristics of real-world data while retaining the benefits of computational traceability. This approach addresses a critical gap in current practices by enabling the ability to provide a controlled and reproducible validation framework for assessing clustering methods applied to analyse FC data. These features allow the ability to score and subsequently enhance the analysis confidence in many FC applications such as in diagnostics or in ‘mock-up’ training scenarios. Future synthetic-data-driven enhancements in FC analysis confidence will translate into more accurate clinical decision-making and subsequent overall improvements in patient care.Author summary: In this study, we introduce a new method for generating realistic synthetic flow cytometry cell clusters. We utilise a series of robust algorithms and modelling functions to accurately translate data properties from real sample cell clusters into a cluster generator to computationally generate a synthetic replication of the real cluster. The approach demonstrates statistical and visual similarities between the synthetic and original real complex biological clusters, with close alignment in the forward and side scatter graph axes often crucial for initial cell characterisation in flow cytometry. These results display promising future applications in flow cytometry analysis with an end goal of helping to facilitate more consistent diagnostics in clinical and industrial settings. Our paper contributes to the field by offering a unique method for generating synthetic data which statistically mirrors real-world measurements. Thereby providing novel opportunities to evaluate manual and automated cluster analysis methodologies.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014280 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14280&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014280

DOI: 10.1371/journal.pcbi.1014280

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2026-05-31
Handle: RePEc:plo:pcbi00:1014280