Prediction of Extracellular Matrix Proteins by Fusing Multiple Feature Information, Elastic Net, and Random Forest Algorithm
Minghui Wang,
Lingling Yue,
Xiaowen Cui,
Cheng Chen,
Hongyan Zhou,
Qin Ma and
Bin Yu
Additional contact information
Minghui Wang: College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
Lingling Yue: College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
Xiaowen Cui: College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
Cheng Chen: College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
Hongyan Zhou: College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
Qin Ma: Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
Bin Yu: College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
Mathematics, 2020, vol. 8, issue 2, 1-18
Abstract:
Extracellular matrix (ECM) proteins play an important role in a series of biological processes of cells. The study of ECM proteins is helpful to further comprehend their biological functions. We propose ECMP-RF (extracellular matrix proteins prediction by random forest) to predict ECM proteins. Firstly, the features of the protein sequence are extracted by combining encoding based on grouped weight, pseudo amino-acid composition, pseudo position-specific scoring matrix, a local descriptor, and an autocorrelation descriptor. Secondly, the synthetic minority oversampling technique (SMOTE) algorithm is employed to process the class imbalance data, and the elastic net (EN) is used to reduce the dimension of the feature vectors. Finally, the random forest (RF) classifier is used to predict the ECM proteins. Leave-one-out cross-validation shows that the balanced accuracy of the training and testing datasets is 97.3% and 97.9%, respectively. Compared with other state-of-the-art methods, ECMP-RF is significantly better than other predictors.
Keywords: extracellular matrix protein; multi-information fusion; synthetic minority oversampling technique; elastic net; random forest (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/8/2/169/pdf (application/pdf)
https://www.mdpi.com/2227-7390/8/2/169/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:8:y:2020:i:2:p:169-:d:314961
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().