Conditional feature importance for mixed data

Blesch, Kristin; Watson, David S.; Wright, Marvin N.

Conditional feature importance for mixed data

Kristin Blesch (), David S. Watson () and Marvin N. Wright ()
Additional contact information
Kristin Blesch: Leibniz Institute for Prevention Research & Epidemiology - BIPS
David S. Watson: King’s College London
Marvin N. Wright: Leibniz Institute for Prevention Research & Epidemiology - BIPS

AStA Advances in Statistical Analysis, 2024, vol. 108, issue 2, No 3, 259-278

Abstract: Abstract Despite the popularity of feature importance (FI) measures in interpretable machine learning, the statistical adequacy of these methods is rarely discussed. From a statistical perspective, a major distinction is between analysing a variable’s importance before and after adjusting for covariates—i.e., between marginal and conditional measures. Our work draws attention to this rarely acknowledged, yet crucial distinction and showcases its implications. We find that few methods are available for testing conditional FI and practitioners have hitherto been severely restricted in method application due to mismatched data requirements. Most real-world data exhibits complex feature dependencies and incorporates both continuous and categorical features (i.e., mixed data). Both properties are oftentimes neglected by conditional FI measures. To fill this gap, we propose to combine the conditional predictive impact (CPI) framework with sequential knockoff sampling. The CPI enables conditional FI measurement that controls for any feature dependencies by sampling valid knockoffs—hence, generating synthetic data with similar statistical properties—for the data to be analysed. Sequential knockoffs were deliberately designed to handle mixed data and thus allow us to extend the CPI approach to such datasets. We demonstrate through numerous simulations and a real-world example that our proposed workflow controls type I error, achieves high power, and is in-line with results given by other conditional FI measures, whereas marginal FI metrics can result in misleading interpretations. Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.

Keywords: Interpretable machine learning; Feature importance; Knockoffs; Explainable artificial intelligence (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10182-023-00477-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:alstar:v:108:y:2024:i:2:d:10.1007_s10182-023-00477-9

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10182/PS2

DOI: 10.1007/s10182-023-00477-9

Access Statistics for this article

AStA Advances in Statistical Analysis is currently edited by Göran Kauermann and Yarema Okhrin

More articles in AStA Advances in Statistical Analysis from Springer, German Statistical Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().