EconPapers    
Economics at your fingertips  
 

The role of data imbalance bias in the prediction of protein stability change upon mutation

Jianwen Fang

PLOS ONE, 2023, vol. 18, issue 3, 1-10

Abstract: There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias caused by data imbalance as there are more destabilizing mutations than stabilizing ones. In this study, a simple approach was developed to construct a balanced dataset that was then conjugated with a leave-one-protein-out approach to illustrate that the bias may not be the primary reason for poor performance. A balanced dataset with some seemly good conventional n-fold CV results should not be used as a proof that a model for predicting protein stability change upon mutations is robust. Thus, some of the existing algorithms need to be re-examined before any practical applications. Also, more emphasis should be put on obtaining high quality and quantity of data and features in future research.

Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0283727 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 83727&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0283727

DOI: 10.1371/journal.pone.0283727

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-05-31
Handle: RePEc:plo:pone00:0283727