A hierarchical approach to removal of unwanted variation for large-scale metabolomics data
Taiyun Kim,
Owen Tang,
Stephen T. Vernon,
Katharine A. Kott,
Yen Chin Koay,
John Park,
David E. James,
Stuart M. Grieve,
Terence P. Speed,
Pengyi Yang,
Gemma A. Figtree,
John F. O’Sullivan and
Jean Yee Hwa Yang ()
Additional contact information
Taiyun Kim: The University of Sydney
Owen Tang: The University of Sydney
Stephen T. Vernon: The University of Sydney
Katharine A. Kott: The University of Sydney
Yen Chin Koay: The University of Sydney
John Park: The University of Sydney
David E. James: The University of Sydney
Stuart M. Grieve: The University of Sydney
Terence P. Speed: Walter Eliza Hall Institute
Pengyi Yang: The University of Sydney
Gemma A. Figtree: The University of Sydney
John F. O’Sullivan: The University of Sydney
Jean Yee Hwa Yang: The University of Sydney
Nature Communications, 2021, vol. 12, issue 1, 1-10
Abstract:
Abstract Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition. This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and thus hinder potential biological discoveries. To date, normalisation approaches have struggled to mitigate the variability introduced by technical factors whilst preserving biological variance, especially for protracted acquisitions. Here, we propose a study design framework with an arrangement for embedding biological sample replicates to quantify variance within and between batches and a workflow that uses these replicates to remove unwanted variation in a hierarchical manner (hRUV). We use this design to produce a dataset of more than 1000 human plasma samples run over an extended period of time. We demonstrate significant improvement of hRUV over existing methods in preserving biological signals whilst removing unwanted variation for large scale metabolomics studies. Our tools not only provide a strategy for large scale data normalisation, but also provides guidance on the design strategy for large omics studies.
Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.nature.com/articles/s41467-021-25210-5 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-25210-5
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-021-25210-5
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().