The impact of data quality filtering of opportunistic citizen science data on species distribution model performance
Camille Van Eupen,
Dirk Maes,
Marc Herremans,
Kristijn R.R. Swinnen,
Ben Somers and
Stijn Luca
Ecological Modelling, 2021, vol. 444, issue C
Abstract:
Opportunistically collected species occurrence data are often used for species distribution models (SDMs) when high-quality data collected through standardized recording protocols are unavailable. While opportunistic data are abundant, uncertainty is usually high, e.g. due to observer effects or a lack of metadata. To increase data quality and improve model performance, we filtered species records based on record attributes that provide information on the observation process or post-entry data validation. Data filtering does not only increase the quality of species records, it simultaneously reduces sample size, a trade-off that remains relatively unexplored. By controlling for sample size in a dataset of 255 species, we were able to explore the combined impact of data quality and sample size on model performance. We applied three data quality filters based on observers' activity, the validation status of a record in the database and the detail of a submitted record, and analyzed changes in AUC, Sensitivity and Specificity using Maxent with and without filtering. The impact of stringent filtering on model performance depended on (1) the quality of the filtered data: records validated as correct and more detailed records lead to higher model performance, (2) the proportional reduction in sample size caused by filtering and the remaining absolute sample size: filters causing small reductions that lead to sample sizes of more than 100 presences generally benefitted model performance and (3) the taxonomic group: plant and dragonfly models benefitted more from data quality filtering compared to bird and butterfly models. Our results also indicate that recommendations for quality filtering depend on the goal of the study, e.g. increasing Sensitivity and/or Specificity. Further research must identify what drives species’ sensitivity to data quality. Nonetheless, our study confirms that large quantities of volunteer generated and opportunistically collected data can make a valuable contribution to ecological research and species conservation.
Keywords: Data quality filtering; Maxent; Opportunistic data; Presence-only; Sample size; Species distribution models (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0304380021000260
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:ecomod:v:444:y:2021:i:c:s0304380021000260
DOI: 10.1016/j.ecolmodel.2021.109453
Access Statistics for this article
Ecological Modelling is currently edited by Brian D. Fath
More articles in Ecological Modelling from Elsevier
Bibliographic data for series maintained by Catherine Liu ().