Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
Zixuan Cang,
Lin Mu and
Guo-Wei Wei
PLOS Computational Biology, 2018, vol. 14, issue 1, 1-44
Abstract:
This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.Author summary: Conventional persistent homology neglects chemical and biological information during the topological abstraction and thus has limited representational power for complex chemical and biological systems. In terms of methodological development, we introduce advanced persistent homology approaches for the characterization of small molecular structures which can capture subtle structural difference. We also introduce electrostatic persistent homology to embed physics in topological invariants. These approaches encipher physics, chemistry and biology, such as hydrogen bonds, electrostatics, van der Waals interactions, hydrophobicity and hydrophilicity, into topological fingerprints which, although cannot literally recast into physical interpretations, are ideally suitable for machine learning, particularly deep learning, rendering topological learning algorithms. In terms of applications, we construct a structure-based virtual screening model which outperforms other existing methods. This competitive model on the DUD database is derived by assessing the performance of a comprehensive collection of topological approaches proposed in this work and introduced in our earlier work, on the PDBBind database. The topological features constructed in this work can readily be applied to other biomolecular problems where the characterization of proteins or small molecules is needed.
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005929 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05929&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005929
DOI: 10.1371/journal.pcbi.1005929
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().