Scalable multi-sample single-cell data analysis by Partition-Assisted Clustering and Multiple Alignments of Networks
Ye Henry Li,
Dangna Li,
Nikolay Samusik,
Xiaowei Wang,
Leying Guan,
Garry P Nolan and
Wing Hung Wong
PLOS Computational Biology, 2017, vol. 13, issue 12, 1-37
Abstract:
Mass cytometry (CyTOF) has greatly expanded the capability of cytometry. It is now easy to generate multiple CyTOF samples in a single study, with each sample containing single-cell measurement on 50 markers for more than hundreds of thousands of cells. Current methods do not adequately address the issues concerning combining multiple samples for subpopulation discovery, and these issues can be quickly and dramatically amplified with increasing number of samples. To overcome this limitation, we developed Partition-Assisted Clustering and Multiple Alignments of Networks (PAC-MAN) for the fast automatic identification of cell populations in CyTOF data closely matching that of expert manual-discovery, and for alignments between subpopulations across samples to define dataset-level cellular states. PAC-MAN is computationally efficient, allowing the management of very large CyTOF datasets, which are increasingly common in clinical studies and cancer studies that monitor various tissue samples for each subject.Author summary: Recently, the cytometry field has experienced rapid advancement in the development of mass cytometry (CyTOF). CyTOF enables a significant increase in the ability to monitor 50 or more cellular markers for millions of cells at the single-cell level. Initial studies with CyTOF focused on few samples, in which expert manual discovery of cell types were acceptable. As the technology matures, it is now feasible to collect more samples, which enables systematic studies of cell types across multiple samples. However, the statistical and computational issues surrounding multi-sample analysis have not been previously examined in detail. Furthermore, it was not clear how the data analysis could be scaled for hundreds of samples, such as those in clinical studies. In this work, we present a scalable analysis pipeline that is grounded in strong statistical foundation. Partition-Assisted Clustering (PAC) offers fast and accurate clustering and Multiple Alignments of Networks (MAN) utilizes network structures learned from each homogeneous cluster to organize the data into data-set level clusters. PAC-MAN thus enables the analysis of a large CyTOF dataset that was previously too large to be analyzed systematically; this pipeline can be extended to the analysis of similarly large or larger datasets.
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005875 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05875&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005875
DOI: 10.1371/journal.pcbi.1005875
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().