HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
Hannah Voß,
Simon Schlumbohm,
Philip Barwikowski,
Marcus Wurlitzer,
Matthias Dottermusch,
Philipp Neumann,
Hartmut Schlüter,
Julia E. Neumann () and
Christoph Krisp ()
Additional contact information
Hannah Voß: University Medical Center Hamburg-Eppendorf (UKE)
Simon Schlumbohm: Helmut Schmidt University
Philip Barwikowski: University Medical Center Hamburg-Eppendorf (UKE)
Marcus Wurlitzer: University Medical Center Hamburg-Eppendorf
Matthias Dottermusch: University Medical Center Hamburg-Eppendorf
Philipp Neumann: Helmut Schmidt University
Hartmut Schlüter: University Medical Center Hamburg-Eppendorf (UKE)
Julia E. Neumann: University Medical Center Hamburg-Eppendorf
Christoph Krisp: University Medical Center Hamburg-Eppendorf (UKE)
Nature Communications, 2022, vol. 13, issue 1, 1-15
Abstract:
Abstract Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods—ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-022-31007-x Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-31007-x
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-022-31007-x
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().