Occupancy Modeling for Rare Species Using Large Datasets: A Subsampling Approach
Johanna de Haan‐Ward,
Simon J. Bonner and
Douglas G. Woolford
Environmetrics, 2025, vol. 36, issue 5
Abstract:
Citizen science monitoring programs, such as the Breeding Bird Survey, provide a wealth of data for understanding species abundance and distribution. However, traditional approaches for occupancy modeling of rare species can be difficult to apply to large, imbalanced datasets. We propose a new method for occupancy modeling where the original dataset is subsampled seasonally, keeping all sites with at least one detection along with a random sample of sites with no detections. Occupancy models cannot be fit directly to these subsampled data because the assumption of binomial sampling no longer holds. However, we show that the occupancy probability is adjusted by an offset, meaning inference on the effects of predictors is still valid. We propose a method for model fitting via direct maximum likelihood and demonstrate via simulation that this leads to computational gains. We illustrate our method using data on Canada Warblers (Cardellina canadensis) from the Breeding Bird Survey in Ontario, Canada from 1997 to 2018, where 95% of sites have zero detections annually, demonstrating that we can accurately estimate the occupancy and detection parameters, including estimating the effects of habitat covariates, using just 10% of the original dataset.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/env.70023
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wly:envmet:v:36:y:2025:i:5:n:e70023
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=1180-4009
Access Statistics for this article
More articles in Environmetrics from John Wiley & Sons, Ltd.
Bibliographic data for series maintained by Wiley Content Delivery ().