Using machine learning to predict and analyze complex trait diseases: Lessons from a simple abstract model

Maimon, Eden; Bondi, Ori; Moult, John; Unger, Ron

Using machine learning to predict and analyze complex trait diseases: Lessons from a simple abstract model

Eden Maimon, Ori Bondi, John Moult and Ron Unger

PLOS ONE, 2026, vol. 21, issue 2, 1-23

Abstract: The ability to predict individual genetic susceptibility to a complex trait disease is a major challenge in modern medicine. One approach to addressing this challenge utilizes an additive combination of contributions from a large number of single nucleotide polymorphisms (SNPs), with weights derived from Genome Wide Association Studies (GWAS). While this approach is somewhat successful in predicting whether an individual is likely to develop a specific disease, it does not explain why a person is likely to become sick. Here, we designed and utilized abstract disease models to investigate the relationship between disease structure, susceptibility, and predictability. The model consists of a set of interacting pathways, each including several nodes representing loci at which genetic variants can alter the function of the corresponding proteins. Due to the introduction of thresholds for pathway functionality, and the interplay between the pathways, this model is inherently non-additive. We use this “toy model” together with simulated variant data to examine the effect of changing various properties, some of which cannot be easily controlled in a “real-world” scenario. As expected, larger sample sizes improved the performance; the omission of some contributing variants from the dataset was associated with a significant decrease in performance, whereas adding irrelevant variants had little effect. Surprisingly, diseases with a more complex underlying structure were better predicted than those with a simpler structure. In addition, risk prediction was more accurate for diseases with lower prevalence. The algorithm was robust to a reasonable percentage of false negative disease assignments. The largest decrease in performance occurred when two diseases with different genetic etiologies were classified as a single pathology, as often occurs in clinical situations, and apparently confuses the neural network algorithm. Finally, we show that a post-analysis of a neural network using t-SNE can provide biological insights into the underlying disease structure.

Date: 2026
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0342490 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 42490&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0342490

DOI: 10.1371/journal.pone.0342490

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().