Determining graphlet explanations for machine learning on graphs
Bettina Soós,
Gonzalo Nápoles,
Pieter Spronck and
Çiçek Güven
PLOS Complex Systems, 2025, vol. 2, issue 8, 1-23
Abstract:
This study introduces a post-hoc approach for explainable machine learning on graph-structured data by identifying relevant subgraphs and their corresponding subgraph patterns, the graphlet motifs. Unlike traditional motif detection methods, which rely solely on statistical occurrence frequencies, and therefore can be decoupled from the learning task, our method optimizes important motifs based on fidelity and sparsity, together with sufficiency and necessity, which are deemed key properties when generating explanations for the learning task. The resulting motifs are relevant due to being explanatory in the prediction task and their subgraph coverings correspond to the explanatory subgraphs. The method outperforms state-of-the-art techniques when comparing performance on the explanatory subgraph. Being an NP-hard problem, without constraining motif structure (e.g., fixing motif size), finding subgraph monomorphs of any form, and for all possible motifs, is computationally difficult. Hence, a genetic algorithm is used to search the possible subgraph space with the properties of desirable explanations. Fidelity ensures that the selected subgraphs maintain predictive power, which means that marginalizing or altering the subgraph would significantly impact the model’s output. Sparsity guarantees that the explanation is as concise as possible, hence avoiding redundant or overly complex subgraphs while still capturing the core reasoning behind the prediction. Minimizing fidelity on the part of the graph that is not in the explanation (the non-covering subgraph) supports that the explanations are not only sufficient, but also necessary. We frame the search for explanatory graphlets as a multi-objective optimization problem that balances these properties. The methodology is demonstrated with single motifs as building blocks of subgraph explanations and evaluated on two complementary synthetic datasets designed for graph prediction tasks, also in comparison with state-of-the-art methodologies. The motifs found not only cover relevant structural patterns but also contribute meaningfully to the model’s decision-making process.Author summary: In this work we present an explainability method for machine learning on graphs. Taking neural networks as machine learning models obscures the interpretability of the models’ prediction, given the complicated functions with large number of parameters they implement. Graphs represent entities and their relations that are increasingly used as primary data structures to represent data for machine learning. However, due to the complex nature of graph-structured data, additional considerations need to be taken when applying explainability methods directly on this domain. We use the genetic algorithm to find relevant substructures in the graph that explain the predictions. The explanations remain higher-order with searching for the combination of nodes and edges, i.e., subgraphs, relevant for the predictions. Furthermore, not only the relevant subgraph in the graph, but the repeating pattern in the subgraph, i.e., the graphlet motif, is analyzed for explainability. The genetic algorithm effectively explores the search space and finds meaningful explanations, which are evaluated with three neural networks on two synthetic datasets.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/complexsystems/article?id=10.1371/journal.pcsy.0000067 (text/html)
https://journals.plos.org/complexsystems/article/f ... 00067&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcsy00:0000067
DOI: 10.1371/journal.pcsy.0000067
Access Statistics for this article
More articles in PLOS Complex Systems from Public Library of Science
Bibliographic data for series maintained by complexsystem ().