EconPapers    
Economics at your fingertips  
 

Species abundance information improves sequence taxonomy classification accuracy

Benjamin D. Kaehler (), Nicholas A. Bokulich (), Daniel McDonald, Rob Knight, J. Gregory Caporaso () and Gavin A. Huttley ()
Additional contact information
Benjamin D. Kaehler: Australian National University
Nicholas A. Bokulich: Northern Arizona University
Daniel McDonald: University of California San Diego
Rob Knight: University of California San Diego
J. Gregory Caporaso: Northern Arizona University
Gavin A. Huttley: Australian National University

Nature Communications, 2019, vol. 10, issue 1, 1-10

Abstract: Abstract Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.

Date: 2019
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-019-12669-6 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:10:y:2019:i:1:d:10.1038_s41467-019-12669-6

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-019-12669-6

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:10:y:2019:i:1:d:10.1038_s41467-019-12669-6