EconPapers    
Economics at your fingertips  
 

Understanding complex predictive models with ghost variables

Pedro Delicado () and Daniel Peña
Additional contact information
Pedro Delicado: Departament d’Estadística i Investigació Operativa

TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, 2023, vol. 32, issue 1, No 4, 107-145

Abstract: Abstract Framed in the literature on Interpretable Machine Learning, we propose a new procedure to assign a measure of relevance to each explanatory variable in a complex predictive model. We assume that we have a training set to fit the model and a test set to check its out-of-sample performance. We propose to measure the individual relevance of each variable by comparing the predictions of the model in the test set with those obtained when the variable of interest is substituted (in the test set) by its ghost variable, defined as the prediction of this variable by using the rest of explanatory variables. In linear models it is shown that, on the one hand, the proposed measure gives similar results to leave-one-covariate-out (loco, with a lowest computational cost) and outperforms random permutations, and on the other hand, it is strongly related to the usual F-statistic measuring the significance of a variable. In nonlinear predictive models (as neural networks or random forests) the proposed measure shows the relevance of the variables in an efficient way, as shown by a simulation study comparing ghost variables with other alternative methods (including loco and random permutations, and also knockoff variables and estimated conditional distributions). Finally, we study the joint relevance of the variables by defining the relevance matrix as the covariance matrix of the vectors of effects on predictions when using every ghost variable. Our proposal is illustrated with simulated examples and the analysis of a large real data set.

Keywords: Explainable artificial intelligence; Estimated conditional distributions; Interpretable machine learning; Knockoffs; Leave-one-covariate-out; Out-of-sample prediction; Partial correlation matrix; Random permutations; 62R07; 68T09; 62G08 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11749-022-00826-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:testjl:v:32:y:2023:i:1:d:10.1007_s11749-022-00826-x

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/11749/PS2

DOI: 10.1007/s11749-022-00826-x

Access Statistics for this article

TEST: An Official Journal of the Spanish Society of Statistics and Operations Research is currently edited by Alfonso Gordaliza and Ana F. Militino

More articles in TEST: An Official Journal of the Spanish Society of Statistics and Operations Research from Springer, Sociedad de Estadística e Investigación Operativa
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:testjl:v:32:y:2023:i:1:d:10.1007_s11749-022-00826-x