Techniques to Improve Ecological Interpretability of Black-Box Machine Learning Models
Thomas Welchowski (),
Kelly O. Maloney,
Richard Mitchell and
Matthias Schmid
Additional contact information
Thomas Welchowski: University of Bonn
Kelly O. Maloney: U.S. Geological Survey (USGS) Eastern Ecological Science Center at the Leetown Research Laboratory
Richard Mitchell: U.S. Environmental Protection Agency Office of Water Washington
Matthias Schmid: University of Bonn
Journal of Agricultural, Biological and Environmental Statistics, 2022, vol. 27, issue 1, No 10, 175-197
Abstract:
Abstract Statistical modeling of ecological data is often faced with a large number of variables as well as possible nonlinear relationships and higher-order interaction effects. Gradient boosted trees (GBT) have been successful in addressing these issues and have shown a good predictive performance in modeling nonlinear relationships, in particular in classification settings with a categorical response variable. They also tend to be robust against outliers. However, their black-box nature makes it difficult to interpret these models. We introduce several recently developed statistical tools to the environmental research community in order to advance interpretation of these black-box models. To analyze the properties of the tools, we applied gradient boosted trees to investigate biological health of streams within the contiguous USA, as measured by a benthic macroinvertebrate biotic index. Based on these data and a simulation study, we demonstrate the advantages and limitations of partial dependence plots (PDP), individual conditional expectation (ICE) curves and accumulated local effects (ALE) in their ability to identify covariate–response relationships. Additionally, interaction effects were quantified according to interaction strength (IAS) and Friedman’s $$H^2$$ H 2 statistic. Interpretable machine learning techniques are useful tools to open the black-box of gradient boosted trees in the environmental sciences. This finding is supported by our case study on the effect of impervious surface on the benthic condition, which agrees with previous results in the literature. Overall, the most important variables were ecoregion, bed stability, watershed area, riparian vegetation and catchment slope. These variables were also present in most identified interaction effects. In conclusion, graphical tools (PDP, ICE, ALE) enable visualization and easier interpretation of GBT but should be supported by analytical statistical measures. Future methodological research is needed to investigate the properties of interaction tests. Supplementary materials accompanying this paper appear on-line.
Keywords: Boosting; Interpretable machine learning; Interaction terms; Macroinvertebrates; Stream health (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s13253-021-00479-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jagbes:v:27:y:2022:i:1:d:10.1007_s13253-021-00479-7
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/13253
DOI: 10.1007/s13253-021-00479-7
Access Statistics for this article
Journal of Agricultural, Biological and Environmental Statistics is currently edited by Stephen Buckland
More articles in Journal of Agricultural, Biological and Environmental Statistics from Springer, The International Biometric Society, American Statistical Association
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().