MicroRNA target gene prediction model based on input-feature dependency and sample data expansion technique

Shao, Yan; Li, Yazhou; Zhai, Hexin; Dong, Shimin

MicroRNA target gene prediction model based on input-feature dependency and sample data expansion technique

Yan Shao, Yazhou Li, Hexin Zhai and Shimin Dong

PLOS Computational Biology, 2026, vol. 22, issue 6, 1-24

Abstract: Predicting microRNA target genes is essential for understanding their biological functions. This study developed a miRNA target gene prediction model based on input-feature dependency. Features were treated as multiple random variables, with marginal densities estimated using Gaussian mixture models (GMM) and dependencies captured by regular vine (R-vine) copula to derive joint probability density functions. We constructed class-conditional joint densities for positive and negative samples separately using GMM and R-vine copula, then combined these with prior probabilities using Bayes’ rule to obtain posterior probabilities of positive interactions, using a standard 0.5 probability threshold for deterministic prediction. To address insufficient data and class imbalance, hybrid distribution mega-trend diffusion was used to generate virtual samples for data augmentation. Computational validation showed high predictive performance even when only 30% of the training data were used. As proof-of-concept, we experimentally validated one predicted interaction (miR-8485 targeting JAK2) using dual-luciferase, cellular, and animal experiments, confirming the biological relevance of this specific model-generated prediction. These findings provide a valuable tool for understanding miRNA functions and disease mechanisms.Author summary: In this study, we developed a new computational model to more accurately predict which genes are regulated by microRNAs—small RNA molecules that play key roles in health and disease. Predicting these targets is difficult because biological data are often limited, imbalanced, and contain complex relationships between features. Our model addresses these challenges by combining two innovations: a probabilistic prediction framework that accounts for dependencies between input features, and a data expansion method that generates realistic synthetic samples to balance the dataset. Computational experiments show that our model performs well even when trained on only 30% of the training data and outperforms existing methods in predictive accuracy. Through laboratory experiments, we validated one prediction—that miR-8485 targets the JAK2 gene—serving as a proof-of-concept demonstration that the model can generate biologically-plausible hypotheses. Our findings provide researchers with a promising tool for uncovering microRNA functions, which can help advance our understanding of diseases and support the development of new therapies.

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014402 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14402&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014402

DOI: 10.1371/journal.pcbi.1014402

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().