TGF-M: Topology-augmented geometric features enhance molecular property prediction
Wei He,
Xu Tian,
Xue Li,
Peifu Han,
Shuang Wang,
Lin Liu and
Tao Song
PLOS Computational Biology, 2025, vol. 21, issue 4, 1-22
Abstract:
Accurate prediction of molecular properties is a key component of Artificial Intelligence-driven Drug Design (AIDD). Despite significant progress in improving these predictive models, balancing accuracy with computational complexity remains a challenge. Molecular topological and geometric features provide rich spatial information, crucial for improving prediction accuracy, but their extraction typically increases model complexity. To address this, we propose TGF-M (Topology-augmented Geometric Features for Molecular Property Prediction), a novel predictive model that optimizes feature extraction to enhance information capture and improve model accuracy, and reduces model complexity to lower computational cost. This approach enhances the model’s ability to leverage both topological and geometric features without unnecessary complexity. On the re-segmented PCQM4Mv2 dataset, TGF-M performs remarkably, achieving a low mean absolute error (MAE) of 0.0647 in the HOMO-LUMO gap prediction task with only 6.4M parameters. Compared to two recent state-of-the-art models evaluated within a unified validation framework, TGF-M demonstrates comparable performance with less than one-tenth of the parameters. We conducted an in-depth analysis of TGF-M’s chemical interpretability. The results further validate the method’s effectiveness in leveraging complex molecular topology and geometry during model learning, underscoring its potential and advantages. The trained models and source code of TGF-M are publicly available at https://github.com/TiAW-Go/TGF-M.Author summary: Predicting molecular properties is a cornerstone of drug discovery, directly influencing the development of new medicines. Current approaches often rely heavily on computationally expensive 3D structural data, posing challenges for large-scale or real-time applications. In the context of molecular modeling, topology represents the atom-to-atom connectivity within a molecule, while geometry describes the precise spatial arrangement of these atoms. Combining these two aspects allows for a more comprehensive understanding of molecular properties, as topology captures structural relationships and geometry encodes spatial interactions. This work introduces a novel method that combines molecular geometric and topological features to enhance prediction accuracy while significantly reducing computational complexity. By bridging the gap between molecular connectivity (2D topology) and spatial arrangements (3D geometry), our approach not only offers a more efficient pathway to understanding molecular behavior but also demonstrates the potential to make advanced predictive models more accessible. This work paves the way for scalable and interpretable molecular modeling, addressing key challenges in data-driven biology and providing new tools for applications in drug design.
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013004 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13004&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013004
DOI: 10.1371/journal.pcbi.1013004
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().