Random forests as a tool for ecohydrological distribution modelling
Jan Peters,
Bernard De Baets,
Niko E.C. Verhoest,
Roeland Samson,
Sven Degroeve,
Piet De Becker and
Willy Huybrechts
Ecological Modelling, 2007, vol. 207, issue 2, 304-318
Abstract:
An important issue in ecohydrological research is distribution modelling, aiming at the prediction of species or vegetation type occurrence on the basis of empirical relations with hydrological or hydrogeochemical habitat conditions. In this study, two statistical techniques are evaluated: (i) the widely used multiple logistic regression technique in the generalized linear modelling framework, and (ii) a recently developed machine learning technique called ‘random forests’. The latter is an ensemble learning technique that generates many classification trees and aggregates the individual results. The two different techniques are used to develop distribution models to predict the vegetation type occurrence of 11 groundwater-dependent vegetation types in Belgian lowland valley ecosystems based on spatially distributed measurements of environmental conditions. The spatially distributed data set under investigation consists of 1705 grid cells covering an area of 47.32ha. After model construction and calibration, both models are applied to independent test data sets using two-fold cross-validation and resulting probabilities of occurrence are used to predict vegetation type distributions within the study area. Predicted vegetation types are compared with observations, and the McNemar test indicates an overall better performance of the random forest model at the 0.001 significance level. Comparison of the modelling results for each individual vegetation type separately by means of the F-measure, which combines precision and recall, also reveals better predictions by the random forest model. Inspection of the probabilities of occurrence of the different vegetation types for each grid cell demonstrates that correct predictions in central areas of homogeneous vegetation sites are based on high probabilities, whereas the confidence decreases towards the margins of these areas. Threshold-independent evaluation of the model accuracy by means of the area under the receiver operating characteristic (ROC) curves confirms good performances of both models, but with higher values for the random forest model. Therefore, the incorporation of the random forest technique in distribution models has the ability to lead to better model performances.
Keywords: Vegetation model; Random forest; Classification tree; Logistic regression; Generalized linear model; Ecohydrology (search for similar items in EconPapers)
Date: 2007
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (22)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0304380007002931
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:ecomod:v:207:y:2007:i:2:p:304-318
DOI: 10.1016/j.ecolmodel.2007.05.011
Access Statistics for this article
Ecological Modelling is currently edited by Brian D. Fath
More articles in Ecological Modelling from Elsevier
Bibliographic data for series maintained by Catherine Liu ().