D-LIM: A neural network for interpretable gene–gene interactions

Wang, Shuhui; Allauzen, Alexandre; Nghe, Philippe; Opuu, Vaitea

D-LIM: A neural network for interpretable gene–gene interactions

Shuhui Wang, Alexandre Allauzen, Philippe Nghe and Vaitea Opuu

PLOS Computational Biology, 2026, vol. 22, issue 3, 1-23

Abstract: Recent advances in gene editing can produce large genotype–fitness maps for targeted genes, yet predicting the effects of mutations between genes remains challenging. Indeed, biochemical models require knowledge of underlying parameters and interactions, whereas machine learning methods typically lack interpretability, as they do not link model parameters to biological quantities. We introduce D-LIM, a neural network that infers low-dimensional fitness landscapes directly from mutation–fitness data. The distinctive feature of D-LIM is that it assumes genes act through independent gene-specific molecular phenotypes whose nonlinear interactions determine fitness. When this assumption holds, the model yields accurate predictions and interpretable effective phenotypes. Conversely, failure reveals that a low-dimensional model is insufficient. Applied to deep mutational scanning of metabolic pathways, protein–protein interactions, and yeast environmental adaptation, D-LIM achieves state-of-the-art predictive accuracy. The inferred phenotype–fitness landscapes reveal whether epistatic interactions can be captured by a low-dimensional continuous model and identify potential trade-offs. Moreover, D-LIM estimates mutational effects on the effective phenotypes, enabling weak extrapolation beyond the training domain. D-LIM demonstrates how simple structure constraints in a neural network can help inference and hypothesis generation in biology.Author summary: Understanding how organisms respond to genetic variation is essential for elucidating evolutionary principles. Advances in high-throughput sequencing now allow fitness measurements across thousands of genetic variants at once. These massive datasets are used to build models that explain mutational effects on fitness and predict outcomes of novel variants. A central goal of modeling is to extract biochemical insights and generate new hypotheses about the genotype-to-fitness relationship. However, modeling genotype-to-fitness relationships remains challenging due to the nonlinear, high-dimensional, and context-dependent nature of genetic effects on fitness. Ideally, one would use detailed biological knowledge to formulate hypotheses. However, they are often unavailable. In such a case, machine learning models may predict outcomes, but their opacity limits hypothesis generation. Here, we propose a neural network-based model that bridges this gap, based on a specific hypothesis: genes contribute independently to phenotypes, which are then combined through a function that determines fitness. Unlike conventional models, fitting data under this architecture directly evaluates the hypothesis. Moreover, the constrained architecture of the model yields interpretable phenotype predictions, enabling insights into genetic trade-offs and the global shape of the genotype-to-fitness map. This opens the possibility of uncovering modularity, redundancy, or epistasis patterns that shape fitness landscapes.

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014107 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14107&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014107

DOI: 10.1371/journal.pcbi.1014107

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().