EconPapers    
Economics at your fingertips  
 

DDGWizard: Integration of feature calculation resources for analysis and prediction of changes in protein thermostability upon point mutations

Mingkai Wang, Khaled Jumah, Qun Shao, Katarzyna Kamieniecka, Yihan Liu and Krzysztof Poterlowicz

PLOS Computational Biology, 2025, vol. 21, issue 12, 1-28

Abstract: Thermostability is an important property of proteins and a critical factor for their wide application. Accurate prediction of ΔΔG enables the estimation of the impact of mutations on thermostability in advance. A range of ΔΔG prediction methods based on machine learning has now emerged. However, their prediction performance remains limited due to insufficiently informative training features and little effort has been made to integrate feature calculation resources. Based on this, we integrated 12 computational resources to develop a pipeline capable of automatically calculating 1,547 features. In addition, a feature-enriched DDGWizard dataset was created, including 15,752 ΔΔG data. Furthermore, we performed feature selection and developed an accurate ΔΔG prediction model that achieved an R2 of 0.61 in cross-validation. It also outperformed several other representative prediction methods in comparisons with independent datasets. Together, the feature calculation pipeline, DDGWizard dataset, and prediction model constitute the DDGWizard system, freely available for ΔΔG analysis and prediction.Author summary: A protein’s ability to maintain its structure under high temperatures, known as thermostability, is critical for many industrial and therapeutic applications and might be affected by genetic mutations. To address the challenge, we built a robust machine learning model to predict the impact of mutations on thermostability. DDGWizard integrates data from multiple computational tools to calculate over 1,500 features for each mutation, offering detailed insights into protein structure and stability. DDGWizard simplifies the complex process of analysis and enables scientists to design more stable proteins for various applications. It bridges the gap between data-rich resources and practical tools. Our model demonstrated superior performance compared to existing methods and provides a freely accessible platform for researchers and industry professionals available at https://github.com/bioinfbrad/DDGWizard.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013783 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13783&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013783

DOI: 10.1371/journal.pcbi.1013783

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-12-07
Handle: RePEc:plo:pcbi00:1013783