EconPapers    
Economics at your fingertips  
 

Using Random Forest Machine Learning to Identify Homes at High Risk from Wildfires in California Counties

James Schmidt

MPRA Paper from University Library of Munich, Germany

Abstract: Wildfires driven by extreme winds, such as the Camp Fire in 2018 and the Eaton and Palisades fires in 2025, account for a large share of structure losses due to wildfires in California. Because these types of events are relatively rare, their risks are difficult to estimate using conventional simulation techniques. This study explores the use of the Random Forest machine learning algorithm as an alternative method for estimating wildfire risk to structures. Environmental variables are estimated for 57,000 structures destroyed in wildfires in California and for 6.2 million unburned structures with the potential for wildfire exposure. A Random Forest model, trained on both the burned and unburned structures, identifies which variables are most effective in distinguishing between the two and which unburned structures belong in the High-Risk category. The six environmental variables found to be the most important in identifying High-Risk structures are: · the annual Red Flag Warning hours (RFW) · the average Energy Release Component (ERC) · the Wildland Urban Interface Zone (WUI) · the Normalized Difference Vegetation Index (NDVI) · the annual number of downslope wind events (DW) · the proportion of sustained winds of 20 mph or greater on high fire danger days (SW20) By adjusting the maximum tree-depth parameter, the Random Forest model is calibrated to produce a state-wide percentage of High-Risk structures of 12% in order to match estimates by the California Department of Insurance (CDI). The CDI estimates are based on a weighted average of insurance industry risk models. Although the Random Forest model matches the CDI estimates for the percentage of High-Risk structures at the state level, the percentage by county differs significantly from the CDI numbers. The largest reductions in the percentage of High-Risk structures occur in the Central Sierra counties of Tuolumne and Mariposa ( -48% and -34% respectively). The largest increases occur in Mono County in the Eastern Sierras (+53%) and Ventura County in Southern California (+42%). Wind characteristics appear to be the primary reason for the differences in county risk ratings. Counties with fewer Red Flag Warning hours, fewer downslope wind days, and a smaller proportion of winds above 20 mph tend to have a smaller percentage of High-Risk structures than estimated by the CDI.

Keywords: wildfire; Random; Forest; California; structures; risk; simulation; wind; WUI; NDVI; ERC (search for similar items in EconPapers)
JEL-codes: D81 R23 Y1 (search for similar items in EconPapers)
Date: 2025-11-02
New Economics Papers: this item is included in nep-env
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://mpra.ub.uni-muenchen.de/126685/1/MPRA_paper_126685.pdf original version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pra:mprapa:126685

Access Statistics for this paper

More papers in MPRA Paper from University Library of Munich, Germany Ludwigstraße 33, D-80539 Munich, Germany. Contact information at EDIRC.
Bibliographic data for series maintained by Joachim Winter ().

 
Page updated 2025-12-16
Handle: RePEc:pra:mprapa:126685